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PURPOSE 


"The  objective  of  this  research  study  and  experimental  investigation 
is  to  determine  techniques  and  criteria  for  the  later  development  of  a  semi¬ 
automatic  imagery  screening  device.  Screening  is  defined  as  'Gross  selec¬ 
tion,  early  in  the  total  interpretation  process,  to  identify  those  areas  in  the 
total  supply  of  imagery  which  meet  the  minimum  qualifications  for  further 
interpretation  by  a  human1 .  " 

The  work  performed  during  the  reporting  interval  included  the 
following: 

1.  Design  of  a  computer  simulation  experiment  to  evaluate  candidate 
semi-automatic  imagery  screening  techniques. 

2.  Investigation  of  the  feasibility  of  implementation  of  alternative 
techniques. 

3.  Study  of  the  application  of  statistical  decision  theory  to  classification 
of  patterns  in  reconnaissance  imagery. 

Philco  is  doing  related  work  under  Contract  AF  30(602) -2793,  "The 
Prenormalization  of  Reconnaissance  Data.  "  The  objective  of  that  program 
is  the  development  of  techniques  for  the  normalization  of  target  images  on 
aerial  reconnaissance  photographs  before  they  are  used  as  stimuli  in  an 
adaptive -memory  type  of  recognition  device. 
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ABSTRACT 


This  report  covers  the  work  performed  under  Contract  DA-36-039-SC- 
90742,  "Semi-Automatic  Imagery  Screening,  "  during  the  six-month  period 
i  September  1962  through  28  February  1963,  comprising  the  fourth  through 
ninth  months  of  a  21-month  program. 

The  following  subjects  are  presented  herein: 

(a)  Discussion  of  the  merits  of  available  techniques  that  maybe 
considered  for  use  in  each  stage  of  the  process  of  screening 
aerial  reconnaissance  photographs  for  purposes  of  detecting 
objects  of  military  significance. 

(b)  Description  and  comparison  of  alternative  hardware  realisa¬ 
tions  of  the  techniques  of  (a)  above. 

(c)  Summary  of  the  application  of  the  principles  of  statistical 
decision  theory  to  the  reconnaissance  photoscreening  problem. 

(d)  Description  of  the  conceptual  desjign  of  an  imagery  screening 
device. 

(e)  Detailed  description  of  a  computer  simulation  problem  on  the 
detection  of  tanks  in  aerial  photographs .  The  simulation,  to  be 
carried  out  during  the  next  six  months,  is  expected  to  provide 
design  details  to  extend  the  conceptual  design  of  item  (d). 
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GLOSSARY  OF  SYMBOLS 
Section*  3  and  4 


Symbol 

Equation 

Definition 

•(t) 

3-2 

the  noiay  input 

«i(t) 

- 

the  value  of  e(t)  at  the  i*k  tap  of  the  matched 
filter  at  time  t 

fj 

Figure  3-1 

the  output  of  the  j**1  threshold  unit 

f(t) 

3-3 

the  matched-filter  output  for  e(t)  input 

f(x.y) 

3-6 

the  matched  filter  output 

G(x,  y) 

3-9 

a  thresholded  gradient  function 

G{ai 

3-5 

the  matched  filter  function 

Ni 

3-4 

the  mean-square  noise  voltage  at  the  1th  tap 

N(«) 

3-5 

the  transform  of  n(t) 

|n(»)| 

the  poorer  density  spectrum  of  n(t) 

N*<«) 

3-5 

the  conjugate  of  N(u) 

n(t) 

3-2 

an  additive  noise  signal 

ri 

Figure  3-1 

the  image  signal  at  the  i**1  input  retinal 
element  of  a  pattern  recognition  logic 

ri.j 

the  value  of  r(x,  y)  at  the  (i,  j)^  tap  of  the 
matched  spatial  filter 

r(x,  y) 

3-6 

the  noisy  input  image 

S(i  A  t) 

3-3 

the  value  of  S(t)  at  the  i**1  sampling  point 

S(iAx,  jAy) 

3-6 

the  value  of  S(x,  y)  at  the  (1,  j)**1  Sample 

point 
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GLOSSARY  OF  SYMBOLS  (Cont) 


Symbol 

Equation 

Definition 

S(t) 

3-1 

ideal  noiae-free  input  signal  to  be  detected 
by  a  matched  filter 

»(t) 

3-1 

this  is  a  signal  defined  over  all  t>  which  is 
equal  to  S(t)  during' 0  <  t  <  T,  and  is  zero 
otherwise 

S(x,  y) 

ideal  noise -free  input  image  to  be  detected 
by  a  matched  spatial  filter 

S(«) 

the  Fourier  transform  of  S(t) 

s*(«) 

3-5 

the  conjugate  of  Sf(& ^ 

T 

3-1 

duration  of  S(t) 

3-2 

starting  time  of  S(t) 

v{t) 

4-9 

a  video  signal  derived  by  scanning  an  image 
with  a  spot 

Wij 

3-7 

the  weight  on  ry  in  the  linear  term  of  a  dis¬ 
criminant  function 

wij;ki 

3-7 

the  weight  on  the  product  r^  j  rk>  j  in  the  quadratic 
term  of  a  quadratic  discriminant  function 

Wj 

Figure  3-1 

the  weight  on  the  threshold  output  at  the 

final  decision  summer 

j 

Figure  3-1 

the  weight  on  the  i**1  input  element  at  the 

summer 

X.Y 

3-6 

the  spatial  extent  of  S(x,  y) 

At 

3-3 

the  Nyquist  interval 

Ax 

3-6 

spatial  Nyquist  interval 
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GLOSSARY  OF  SYMBOLS  (Cont) 


Symbol 

■ 

Equation 

Definition 

■  Ay 

3-6 

■pat ini  Nyquiat  interval 

g  8  , 

a  threshold  value 

g  ’z  m 

3-4 

variance  of  the  distribution  of  signal  (voltage) 
values  at  the  i**1  tap 

0 

1  -j 

Section  6 

^  Symbol 

Equation 

Definition 

D  N 

0 

6-4 

6-5 

6-35 

6-39 

i1*1  coefficient  of  a  linear  classification 
function 

0 

6-7 

6-13 

ij^  coefficient  of  the  quadratic  te^xn  of  a 
quadratic  classification  function 

0  bl 

6-7 

6-13 

IV 

i  coefficient  of  the  linear  term  of  a  quad¬ 
ratic  classification  function 

0 

6-3 

6-5  * 

a  constant 

0 

6-12 

constant’s  arising  from  the  logarithm  of  the 
likelihood  ratio  of  joint  distributions  of 
binary  variables  as  specified  on  page  6-7 

l! 

Mahalanobis  generalized  distance 

C  D" 

6-24 

Mahalanobis  generalized  distance  between  two 
samples  based  on  N  measured  characteristics 

0  dj 

the  difference  between  the  mean  values  of  the 
j^1  variable  in  the  two  groups 

8 

I 
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GLOSSARY  OF  SYMBOLS  (Cant) 


Symbol 

Equation 

Definition 

E 

expectation  operator 

Eg 

6-11 

expectation  taken  with  respect  to  the  proba¬ 
bility  function  of  group  g 

F 

6-30 

the  variance  ratio  of  F  distribution 

Fg 

a  multivariate  probability  distribution 
function  for  the  g**1  population 

£g 

a  multivariate  probability  density  function 
for  the  gth  population 

*g 

the  sample  estimate  c£  fg 

G 

the  total  number  of  groups 

g 

an  index  taking  on  values  from  1  to  G  and 
denoting  the  group  number 

i 

an  index  usually  running  from  1  to  N  and 
referring  to  the  i^  variable 

j 

an  index  usually  running  from  1  to  N  and 
referring  to  the  j**1  variable 

K 

a  positive  integer  used  in  the  nonparametric 
procedure  of  Fix  and  Hodges 

k 

O'  O' 

1  1 

o 

an  index 

kjg. 

6-14 

the  cost  of  classifying  group  g  into  group  j 

L 

the  likelihood  ratio 

mean  vector  for  the  g^1  group 

-xiv- 


GLOSSARY  OF  SYMBOLS  (Coat) 


I 

I 


^  Symbol 

Equation 

Definition 

K  *!•’ 

R 

6-2 

the  i**1  element  of  M^gi;  denotes  the  mean 
value  of  the  Ith  variable  from  the  g*h 
population 

u 

N 

0 

the  number  of  variables  measured;  the 
dimension  of  the  space  in  which  samples  of 
patterns  are  represented  as  points 

n 

the  number  of  samples  from  group  g 

Li 

the  a  nriori  orobabilitv  of  srouo  a 

H  % 

estimate  of 

C 

ft 

estimates  for  the  density  functions  f 
obtained  in  the  nanparametric  procedure 
of  Fix  and  Hodges 

U  »1.  *2 

n 

decision  regions  which  divide  the  N  dimensional 
space  in  which  samples  are  represented 

0  RU) 

the  expected  risk  of  choosing  group  j 

Ll 

ov  O' 

1  1 
lft>  l*> 

*  •- 

Rao's  statistic  for  testing  the  worth  of  addi¬ 
tional  predictors 

0  *’■  | 

0  v-  I 

6-11 

second,  third,  and  n^  order  correlation 
parameters  obtained  in  the  expansion  of  the 
joint  distribution  of  binary  variables  for 
group  g 

u 

sample  covariance  matrix 

11 

6-15 

inverse  of  sample  covariance  matrix 

6-18 

i,  J**1  element  of  sample  covariance  matrix 

D  .« 

6-24 

i,  j**1  element  of  the  inverse  of  the  sample 

covariance  matrix 

0 
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Symbol  Equation 


Definition 


t  _  threshold  used  in  likelihood  ratio  rule 

(g) 

u.  ratio  of  the  second- order  correlation  para - 

J  meter  for  group  g  to  the  product  of  the 

standard  deviation  of  the  i^1  and  variables 
in  group  g 


v* 

population  covariance  matrix  for  group  g 

v-1 

inverse  of  papulation  covariance  matrix 

viJ 

i,  j**1  element  of  population  covariance  matrix 

viJ 

i,  j**1  element  of  inverse  of  population  covariance 
matrix 

W 

within  samples  scatter  matrix 

w"1 

inverse  of  within  samples  -  scatter  matrix 

wij 

6-27 

i,  element  of  within- samples  scatter  matrix 

X 

vector  stochastic  variable 

the  I**1  element  of  the  vector  X;  a  predictor 
variable 

X* 

Ai 

the  1th  element  of  the  selected  set  of  predictors 
obtained  by  a  screening  procedure 

X 

observed  values  of  the  vector  variable  * 

X* 

transpose  of  the  column  vector  x 

1E<*> 

6-16 

the  mean  of  the  vector  x  in  group  g 

^g) 

xi 

6- 17 

the  mean  of  ng  observations  on  the  i**1  variable 

xj  in  group  g 
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Symbol 


2<«) 


N 


Equation  Definition 


the  standardized  variable  corresponding  to 
obtained  by  subtracting  the  mean  of  34 
in  group  g  from  and  dividing  the  result 
by  the  standard  deviation 

6-35  a  linear  function  of  the  xi 

the  mean  value  of  the  samples  from  group  g 
after  projection  along  a  line 
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SECTION  I 


INTRODUCTION 


The  principal  objective  of  this  program  ia  to  complete  a  broad,  compre¬ 
hensive  survey  of  automatic  pattern  classification  techniques  and  the  alternative 
implementations  of  these  techniques.  The  goal  of  this  survey  is  to  obtain 
insight  into  the  feasibility  and  usefulness  of  semi-automatic  imagery  screening 
equipment  based  on  these  techniques  as  an  aid  to  the  photo  interpreter.  The 
work,  therefore,  is  concentrated  primarily  in  two  main  areas.  The  first  of 
these  is  concerned  with  fundamental  processes  involved  in  automatic  detection 
and  correlation  of  militarily  significant  patterns  on  aerial  photographs.  The 
second  area  concerns  the  capabilities  and  limitations  of  current  and  future 
techniques  for  carrying  out  these  processes  in  practical  equipment. 

The  design  of  a  specific  semi-automatic  screening  system  is  not  a  goal 
of  the  present  program.  System  design  is  recognised  as  the  next  logical  step 
in  the  overall  objective  of  rendering  the  current  research  effort  useful  to  the 
Army,  presuming  the  prior  demonstration  of  proof- of- feasibility  or  reasonable 
probability  of  success.  Consequently,  the  current  program  is  being  pursued 
in  a  manner  so  as  to  obtain  as  much  preliminary  information  as  possible  on  the 
design  of  a  system,  consistent  with  the  broader  objectives  of  the  general  survey. 

An  analysis  described  in  this  report  has  demonstrated  the  need  for  a 
better  and  simpler  device  for  the  storage  and  correlation  of  picture  data.  A 
survey  of  new  approaches  to  meet  this  need  resulted  in  a  concept,  also 
described  in  this  report,  for  a  new  type  of  device  that  appears  capable  of 
overcoming  the  problem.  Another  activity  in  the  present  program  that  will 
facilitate  later  system  design  is  a  brief  study,  summarised  in  the  present  report, 
of  a  preliminary  conceptual  design  for  a  complete  target  classification  system. 
These  examples  illustrate  the  broad  survey  approach,  with  design  objectives 
in  mind,  which  characterizes  the  present  program. 
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THE  IMAGERY  SCREENING  PROBLEM 


In  order  to  carry  out  a  comprehensive  investigation  of  semi-automatic 
screening  techniques,  it  is  essential  that  the  investigator  have  a  general 
appreciation  of  the  operational  imagery  screening  problem.  During  this 
reporting  period  therefore,  Philco  project  personnel  have  discussed  the 
problem  with  Dr.  E.  A.  Rabben  of  DOD  (WSEG),  Mr.  Lumen  of  the  Army 
Intelligence  Material  Development  Agency,  Dr.  Saddacca  and  Dr.  Bimbaum 
of  the  Army  Personnel  Research  Office,  and  have  also  studied  official  Army 
publications  ^  related  to  the  subject.  A  brief  summary  of  the  information 
thus  obtained  follows. 

The  photo  interpreter  normally  must  detect,  identify,  locate,  and 
report  as  many  as  possible  of  the  militarily  significant  objects,  features 
and  clues  in  reconnaissance  imagery.  Some  time «  his  task  is  to  look  for 
only  certain  specific  information,  e.  g. ,  nuclear  delivery  units  or  armor 
concentrations. 

In  the  past,  each  photo  interpreter  (P.  L  )  was  assigned  a  particular 
area  of  responsibility  with  which  he  became  thoroughly  familiar  by  studying 
basic  photo  cover,  maps,  and  intelligence  reports.  When  new  photo  cover¬ 
age  was  to  be  obtained,  the  P.  L  team  would  be  informed  in  advance  of  the 
sorties  to  be  flown,  coverage  to  be  obtained,  scale,  and  priorities  for 
interpretation.  Each  P.  L  would  then  collect  and  study,  as  time  permitted, 
all  available  collateral  information  pertinent  to  the  up-coming  screening 
task.  This  collateral  information  was  contained  mainly  in  intelligence 
reports  and  did  photo  coverage  at  the  area.  When  the  new  photography 
arrived,  it  would  be  divided  up,  each  member  of  the  P.  I.  team  taking  the 
coverage  in  his  area  of  responsibility.  Obviously,  this  "area  of  responsibility" 
technique  has  its  limitations:  if  most  of  the  coverage  fell  in  the  area  as¬ 
signed  to  cme  or  two  men,  they  would  either  have  to  do  all  the  screening  that 
day  or  accept  help  from  people  who  were  not  knowledgeable  about  the  area. 


1.  Department  of  the  Army  Field  Manual  30-20. 

2.  Department  of  the  Army  Phamphlet  381-1,  "Combat  Intelligence,  Field 
Army,  1965-1975, "  September,  1962. 


2~1 


A  typical  sortie  would  require  each  P.I.  to  go  through  about  100  photo¬ 
graphs.  The  rapidity  with  which  a  man  can  screen  imagery  is  extremely 
variable;  an  expert  who  is  completely  familiar  with  an  area  may  screen  most 
of  his  pictures  in  less  than  a  minute  each,  while  less  expert  P.  I.  's  who  are 
not  familiar  with  the  area  may  take  up  to  ten  minutes  per  photograph. 

The  imagery  screening  task  in  1965  and  beyond  will  differ  greatly  from 
the  typical  World  War  II-Korea  task  described  above.  In  the  first  place,  a 
much  greater  volume  of  imagery  will  have  to  be  screened.  There  probably 
will  be  many  more  photographs  per  day  per  unit  area  than  in  the  past,  and 
most  of  the  photographs  will  be  accompanied  by  high  resolution  radar  and  IR 
imagery.  Future  tactical  reconnaissance  flights  will  provide  three  kinds  of 
photo  coverage  in  positive  transparency  rolls  at  5"  x  5"  frames:  panoramic, 
vertical  (1:5000  to  1:12,000),  and  oblique.  In  addition,  the  same  flights  will 
produce  5”  x  5"  frames  of  high  resolution  IR  imagery  at  1:7000  to  1:20,  000 
scale  and  5"  x  5"  side-looking  radar  pictures  at  1:250.000  to  1:7  million 
scale.  , 

To  be  able  to  screen  such  a  large  volume  of  imagery  effectively:  the 
P.I.  team  will  need  the  best  assistance  that  modern  technology  can  provide 
in  a  field  environment.  The  Army  is  developing  the  Tactical  Imagery  Inter¬ 
pretation  Facility  (TIIF)  for  this  purpose.  The  first  TIIF  will  contain  ad¬ 
vanced  information- storage  and  retrieval  and  display  facilities  to  provide  the 
P.I.  with  all  available  collateral  information  in  optimum  format.  Eventually, 
the  TIIF  should  evolve  into  a  much  more  sophisticated  system,  incorporating 
computerised  collation  and  analysis  of  intelligence  data,  and  pattern  recogni¬ 
tion  equipment  for  screening  all  types  of  reconnaissance  imagery. 

Under  the  present  contract,  Philcois  directing  its  attention  toward  die 
conceptual  design  of  automatic  classification  techniques  capable  of  screening 
high  resolution,  vertical  aerial  reconnaissance  photographs  to  locate  those 
areas  which  most  probably  contain  tactical  targets  of  interest  to  the  photo 
interpreter.  The  present  phase  of  the  work  is  directed  toward  the  develop¬ 
ment  of  techniques  that  are  capable  of  recognising  exposed,  uncamouflaged 
targets  of  a  tactical  nature,  e.  g. ,  tanks,  vehicles  and  aircraft  (which  are 
clearly  recognisable  to  the  human  observer),  without  reference  to  collateral 
clues  such  as  tank  trackage,  location  of  the  possible  target  on  a  road  or  run¬ 
way,  or  the  nature  of  the  local  terrain.  This  is  only  one  part  of  the  overall 
imagery  screening  problem.  However,  this  problem  must  be  solved  with 
relatively  simple  techniques  which  can  be  implemented  compactly  and  econom¬ 
ically  as  a  first  step  toward  a  useful  semi-automatic  screening  system. 
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SECTION  3 


FUNDAMENTAL  CONCEPTS  AND  TECHNIQUES 


3.  I  Basic  Recognition  Concepts 

This  section  presents  a  description  and  evaluation  o f  the  fundamental 
processes  and  technique s  involved  in  the  detection  of  targets  of  gray- scale 
imagery.  Later  sections  of  the  report  cover  specific  technical  points  relative 
to  the  underlying  statistical  theory  ns  well  as  the  implementation. 

In  many  respects  the  problem  of  detecting  targets  in  gray-scale 
imagery  is  analogous  to  the  classical  communications  problem  of  detection 
of  signal  in  noise.  Both  of  these  problems  are  amenable  to  solutions  through 
application  of  statistical  techniques.  There  are,  however,  important  dif¬ 
ferences  between  the  two  situations.  In  communication,  the  problem  is 
usually  that  of  detecting  a  known  Signal  in  a  background  of  additive  noise.  The 
noise  is  usually  assumed  to  be  normally  distributed  in  amplitude  and  indepen¬ 
dent  of  frequency  (or  "white")  over  that  portion  of  the  spectrum  of  interest. 

In  such  a  situation,  the  output  signal-to-noise  ratio  of  the  receiver  is  maxi¬ 
mized  by  use  of  a  "matched  filter  ,  "  which  may  be  implemented  by  a  delay  line 
with  output  taps  spaced  at  the  Nyquist  interval.  The  signal  outputs  from  the 
taps  are  combined  in  a  weighted  summation. 

If  the  ideal  input  signal  to  the  communications  receiver  is 

0  <  t  <  T 

otherwise,  (3*1) 

and  the  actual  corrupted  signal  is 


e(t)  =  n(t)  +  s(t  -  tQ)  (3-2) 

then  the  output  from  a  matched  filter  will  be 

T 

i=  At 

f(t)  =  ^  e(t  -  iAt)S(iAt)  (3-3) 

i=o 

where  At  is  the  Nyquist  interval. 
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A  simple  threshold  make*  the  following  decisions: 
if 


f(t)  >  9  ,  then  signal  present  at  time  t , 

and  if 

f(t)  <  9  ,  then  only  noise  present  at  time  t. 

The  matched  filter  concept  can  be  derived  from  an  analysis  e 4  linear  filter 
functions  which  are  the  same  as  linear  discriminant  functions  in  statistical 
decision  theory.  The  matched  filter  is,  in  fact,  a  realisation  e t  n  linear 
discriminant  function  operating  on  the  independent  random  variables 
ej(t)  =  e(t  -  iA  t).  Its  operation  is  based  on  file  assumption  ef  normal  dis¬ 
tribution  a f  the  random  variables  ejjt)  with 


^I«i(t)l  =  Nj. 


(3-4) 


where  Nj  is  the  mean  squared  noise  voltage  and  where  file  above  equation 
holds  for  cases  of  signal  plus  noise  and  noise  alone, 

A  more  geaer  '  expression  for  the  matched  filter  is  the  function 


G{«) 


S*(w) 

N(«)  N*(«) 


(3-5) 


which  is  applicable  to  cases  where  |  N(m)|  is  not  evenly  distributed  over  the 
spectrum.  This  function  may  have  a  transform,  g(t),  which  can  be  imple¬ 
mented  in  a  finite  tapped  delay  line  cross-correlator,  or  it  may  theoretically 
require  an  infinite  delay  line.  In  practice,  however,  implementation  of  these 
filters  with  finite  delay  line  cross-correlators  usually  yields  good  approxi¬ 
mations  to  the  desired  result. 


The  analogous  operation  to  the  matched  filter  in  pattern  recognition 
is  the  application  of  a  linear  discriminant  function  to  the  two-dimensional 
gray-scale  signal  defining  the  image.  Such  a  process  yields  f(x,y)  where 

x.  v 

*(x»y)  =  ^  r(x-iAx,  y-jAy)  S  (iAx,  JAy)  (3-6) 

i=l  j=l 
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where  Ax  end  Ay  are  the  spatial  Nyquist  sampling  intervals,  and  S(i  Ax,  j  Ay) 
are  linear  weights.  Equation  3-6  describes  a  pattern  recognition  technique 
known  as  “template  — ,  n  which  corresponds  to  making  a  two-dimensional 
cross-correlation  of  the  inpat  imagery  against  an  optimised  representation  of 
the  pattern. 

Past  experience  at  Pldlco  in  designing  recognition  equipment,  e.g. , 
machine  print  and  band  print  readers,  has  shown  that  template  matching 
techniques  normally  are  not  adequate.  The  use  of  die  linear  discriminant 
function  is  optimum  only  under  a  very  restricted  set  of  conditions.  The  random 
variables,  r^  :  r(x-iAx,  y  -  j  Ay),  must  be  normally  distributed  and  die 
covariance  matrix  relating  the  h{ji(  must  be  the  same  for  both  the  target  and 
nontarget  cases  (see  Section  6).  These  conditions  do  not  hold  for  typical  gray¬ 
scale  inputs  or  for  black  and  white  alphanumeric  s . 

Figure  3-1  is  a  block  diagram  representative  of  a  number  of  suggested 
pattern  recognition  systems.  These  systems  all  have  the  common  feature  that 
they  avoid  the  limitations  of  die  linear  discriminant  approach  through  dm  use 
of  intermediate  sets  of  non-linear  decision  elements. 

Referring  to  Figure  3- 1,  “Simple  Layered-Decision  Pattern  Recognition 
Logic,  "  many  subsets  of  retinal  elements  are  connected  in  weighted  linear  sum¬ 
mations  as  in  Equation  3.6,  to  threshold  elements  which  give  unit  outputs  if  the 
weighted  sum  exceeds  the  threshold,  and  xero  otherwise.  The  outputs  foam  the 
threshold  elements  represent  a  new  set  of  random  variables  font  are  c pane rt art 
in  a  weighted  linear  summation  to  a  final  threshold  element  that  constitutes  a 
final  decision. 

This  basic  structure  describes  feature  mapping  logics  in  which  die 
elements  in  each  subset  are  taken  from  a  small  local  area  of  the  image,  and 
weighted  in  a  manner  that  constitutes  a  matched  filter  or  template  for  specific 
sub-portions  of  the  target.  The  binary  random  variables  from  the  threshold 
circuits  indicate  the  presence  or  absence  of  specific  features  in  the  pattern. 

The  basic  structure  described  here  is  used  both  in  the  simple  per- 
ceptrons  *  and  by  Gamba  in  his  PAPA  ^  machine.  In  these  systems,  the  retinal  - 
elements  connected  to  each  summer  and  threshold  circuit  are  selected  and 
weighted  randomly  within  the  constraints  of  die  particular  model.  The  selection 

1.  F.  Rosenblatt,  Principles  of  Neuro<*™»™i<-«  Spartan  Books,  Washington, 
D.  C. ,  1962. 

2.  G.  Palmieri  and  R.  Sanna,  "Automatic  Probabilistic  Programmer  Analyser 
for  Pattern  Recognition, "  Estro  Ho  Ri vista  hfethodns.  Vol.  XII,  Number  48, 
1960, 
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Figure  3-1  Simple  Layered  Decision  Pattern  Recognition  Logic 


of  retinal  element!  purely  at  random  ii  not  an  efficient  approach  to  feature 
selection,  and  techniques  used  have  varied  from  random  to  partially  random 
to  completely  deterministic  wiring  schemes  depending  upon  the  motivation  of 
the  investigator.  Gamba,  for  example,  used  random-line  masks  for  alpha¬ 
numeric  recognition  in  an  attempt  to  take  advantage  of  the  connectivity 
property  of  printed  characters. 

In  both  the  "feature  map"  and  the  "random  mask"  systems,  the 
resulting  output  from  the  intermediate  thresholds  is  a  set  of  binary  random 
variables.  Various  adaptive  and  statistical  techniques  have  been  recommended 
and  used  to  analyse  these  variables  with  linear  and  higher  order  discriminant 
functions.  Statistical  techniques  will  be  reviewed  in  Section  6  of  this  report. 

The  principal  property  that  relates  the  systems  just  described  is  the 
intermediate  set  of  subdecisions,  or  thresholds.  At  first  glance  the  thres¬ 
holding  operation  appears  undesirable  because  it  destroys  information,  i.  e. , 
it  converts  an  analog  voltage  amplitude  into  a  binary  signal.  It  is  apparent 
that  if  the  intermediate  thresholds,  Bj,  Bg,  •  •  •  >  B^>  were  eliminated  from  the 
logic  network  of  Figure  3- 1,  the  resultant  network  would  be  equivalent  to  a 
single  linear  discriminant  function,  or  template  matching  operation,  over  the 
entire  retinal  space.  The  threshold  elements,  in  association  with  die  linear 
summations,  transform  the  retinal  data,  producing  a  new  set  of  random 
variables  in  which  the  important  information  is  distributed  in  the  marginal 
probabilities  and  first-order  correlation  coefficients  where  it  in  more  readily 
accessible  to  the  discriminant  function  which  follows.  The  specification  of 
threshold  elements  for  this  application  is  not  general;  other  non-linear 
functions  may  be  applicable  in  special  cases. 

The  threshold  elements,  in  association  with  the  linear  Summations, 
transform  the  retinal  data,  thereby  producing  a  new  set  of  binary  random 
variables  which  are  the  input  variables  for  another  discriminant  function. 

This  combination  of  two  layers  of  linear  processing  with  an  intermediate 
threshold  provides  a  simple  and  effective  way  to  extract  image  information 
that  is  contained  in  the  interelement  relationships  and  is  not  accessible  to 
a  simple  linear  discriminant  function.  The  effectiveness  of  multilayer  systems 
has  been  demonstrated  at  Philco  in  multiple-font  Tint  reading  equipment. 

Another  non-linear  processing  operation  that  is  useful  in  pattern 
recognition  is  detail  or  boundary  detection  (see  Section  3.2  below).  This 
detection  is  accomplished  by  thresholding  a  spatial  derivative  (gradient)  of 
image  brightness',  which  transforms  the  gray-scale  picture  into  a  black- 
and-white  "line-drawing"  representation  which  tends  to  outline  objects  while 
reducing  the  extraneous  clutter  contained  in  the  original  gray- scale  image. 


Another  type  o£  non-linear  filtering  operation  ia  one  which  not  only 
adds  up  weighted  sums  of  retinal  element  brightness  levels  as  in  Equation 
3-6,  but  also  takes  into  account  the  correlations  between  brightness  levels 
at  different  points.  This  type  of  operation  results  in  quadratic  and  higher 
order  discriminant  functions  that  should  offer  much  better  performance  than 
simple  linear  discriminant  functions  and  reduce  the  number  of  decision 
layers  needed.  A  quadratic  discriminant  function  would  take  the  form 


'<*■*>  =  1111 


k  1 


ri*  j  rk,  i 


'ijiki 


II 

i  j 


ri.j  Wij 
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Implementation  of  such  a  function  is  difficult  because  the  number 
of  product  terms  in  the  sum  is  n(n-l)/2  where  n  is  the  total  number  of  retinal 
elements  (at  least  1000)  required  to  describe  a  typical  pattern.  The  problem 
becomes  greater  with  higher  order  functions.  Also,  if  the  input  data  are 
not  binary,  the  implementation  of  the  product  terms  is  not  simple. 

Practical  considerations,  therefore,  limit  the  amount  of  higher  - 
prder  statistical  correlation  that  may  be  accounted  for  in  recognition  logic 
networks.  It  is  therefore  necessary  to  transform  the  input  random  variables 
--  the  retinal  element  data  --  into  a  new  and  smaller  Set  of  random  variables 
which  contain  the  relevent  target  information  in  such  a  manner  so  as  to  be 
accessible  to  simple  linear,  or  at  most  quadratic,  discriminant  functions. 

No  complete  analytic  approach  to  thiS  problem  of  transforming  the  retinal 
data  has  been  discovered.  The  problem  must  be  solved  heuristically  with 
application  of  statistical  techniques  wherever  appropriate. 

In  scanning  a  photograph  to  recognise  a  relatively  small  object  of 
military  interest,  the  photograph  can  be  scanned  block  by  block  with  overlap 
between  blocks,  the  individual  blocks  and  the  amount  of  overlap  being  just 
large  enough  to  insure  that  the  object  will  appear  within  the  block  in  some 
one  position.  The  object  may  then  appear  in  any  position  and  in  any  orien¬ 
tation  within  the  block. 
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Recognition  schemes  based  on  detection  of  geometrical  features  must  provide 
for  detection  of  the  appropriate  features  in  every  possible  position  and 
orientation  of  the  object  within  the  block.  Testing  for  different  translations 
at  one  orientation  can  be  accomplished  electronically  with  one  set  of  feature 
masks  by  moving  the  signals  from  all  resolution  elements  in  the  block  through 
a  shift  register  or  a  tapped  delay  line.  In  this  report,  this  procedure  is 
referred  to  as  registry  testing.  In  order  to  test  for  unsymmetncal  objects 
at  all  orientations,  it  is  necessary  to  provide  feature  detectors  at  different 
angles .  For  example,  if  a  straight  line  detector  is  capable  of  detecting  lines 
oriented  within  about  ±  12  degrees  with  respect  to  a  aero  line,  eight  separate 
line  detectors  must  be  orovided  at  intervals  of  22-1/2°.  For  the  detection  of 
tranelationally  or  rotationally  invariant  features  such  as  isotropic  texture 
differences ,  registry  testing  and  testing  for  separate  orientations  are  not 
required. 

Using  the  foregoing  basic  concepts,  the  design  approach  to  an  image 
screening  system  is  presented  in  Tables  3-1  and  -2  and  in  Figure  3-2  „ 
"Conceptual  Design  of  Image  Screening  System.  "  The  question  arises  as  to 
which  image  characteristics  fall  into  the  various  classes  of  parameters, 
simple  objects,  and  final  decisions.  To  illustrate  the  general  trend.  Table 
3-3,  "Identification  of  Heavy  Antiaircraft  Artillery  Implacement.  "  has  been 
drawn  up  from  material  available  in  World  War  H  P.I.  manuals. 


1.  TM  30-246,  "Tactical  Interpretation  of  Air  Photos,  "  Dept,  of  the  Army, 
February  1954  (also  Navaer  10-35-613). 
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TABLE  3-1 

DATA  PREPARATION  ELEMENTS 
OF 

IMAGE  SCREENING  SYSTEM 


Unit 

Function 

Mechanical  Transport 

Passes  images  automatically 
into  view  of  scanner.  Probably 
simple  unit  for  continuous  film, 
more  complex  for  photographs . 

Optical  or  Electro- 
optical  Scanner 

Scans  images  by  combination  of 
serial  and  parallel  operations 
determined  by  trade-offs  betweer 
desire  for  rapid  handling  versus 
compact  equipment. 

Video  Data  Pre¬ 
processor 

Normalise  video  for  scale, 
contrast,  etc.  Edges  detected, 
silhouetting  performed  if 

Storage  Unit 

Storage  of  parts  of  image  in¬ 
formation  for  correlation  of 
information  spanning  more  than 
one  scan  area.  Note  that  use  of 
image  as  a  memory  as  much  as 
possible  will  minimise  need  for 
high-speed,  high- capacity  elec¬ 
tronic  memory. 

TABLE  3-2 


DECISION  ELEMENTS  OF  IMAGE  SCREENING  SYSTEM 


Unit 


Function 


Parameter  Measurement 
Unite 


Extracts  detailed  information 
preliminary  to  and  required 
for  the  final  decision.  This 
preliminary  information  includes 
presence  of  straight  lines,  inter¬ 
secting  lines,  right  angles,  dis¬ 
tinctive  texture,  etc. 


Simple-Object  Recognition 
Unit 


From  the  parameters  (see  above) , 
decisions  are  made  regarding 
the  presence  or  absence  of 
simple  objects,  e.g., 
trucks,  cable  trenches, 
buildings,  etc. 


Final  Decision  Unit 


The  final  decision  unit  assigns 
a  priority  to  the  photograph 
or  image  in  terms  of  the  prob¬ 
ability  of  presence  of  the 
complex  target  of  interest  at 
the  time,  e.g.,  missile  sites, 
camouflaged  gun  emplacements, 
etc. 


TABLE  3-3 


IMAGE  SCREENING  SYSTEM  APPLIED  TO 
LOCATING  HEAVY  AA  EMPLACEMENTS 


Identification  Characteristic 

Functional  Unit  in  Image 
Screening  System 

Circular  emplacement 

16'  to  35*  diameter 

Parameter  Measurement  Unit 
4PMU) 

Guns  in  units  of  4,  6,  or  ? 
with  central  CP 

Simple  Object  Recognition  Unit 
(SORU) 

Horizontal  support  girders 
(for  heavy  AA  guns) 

PMU  -  Straight  line  detection 

Generator  Dugout 

SORU  -  Look  for  characteristic 
size  and  irregular  shape 

Radar  and  Sound  Location 
Equipment 

SORU  -  May  involve  complex 
of  parameter  measurement 
outputs 

Searchlight  Positions 

SORU  -  May  involve  complex^ 
of  outputs  from  PMU  | 

Cable  Trenches 

PMU  -  Line  detection  | 

Crew  Quarters 

PMU 

Vehicle  and  personnel 
paths 

SORU 
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Preprocessing  Techniques 


Geometric  Normalization 

Geometric  normalization  is  a  preliminary  preprocessing  technique  for 
eliminating  scale  variations  in  an  aerial  photograph.  It  is  common  practice  to 
record  the  altitude  of  the  reconnaissance  aircraft  on  each  frame.  This  informa¬ 
tion  can  be  used  directly  in  conjunction  with  ground  elevation  information  to 
compute  the  necessary  adjustments  needed  to  normalize  the  input  to  a  particular 
scale  factor.  This  normalization  adjustment  can  be  made  optically  in  the  sensor 
portion  of  the  system,  or  electronically,  by  adjusting  the  sweep  waveforms  of 
the  electronic  scanner. 

Geometrical  normalization  is  sometimes  assumed  to  include  a  process 
whereby  objects  of  interest  are  centered  and  placed  in  a  fixed  orientation 
within  the  frame  to  be  studied.  However,  before  this  can  be  done,  the  object 
of  interest  must  at  least  tentatively  be  identified  and  the  angular  portion  of 
some  geometrical  features  determined.  Consequently,  it  is  believed  unrealistic 
to  speak  of  centering  and  orienting  the  object  prior  to  identification.  These 
operations,  therefore,  are  not  included  under  the  heading  of  geometrical  pre¬ 
normalization  in  the  present  study.  Further,  in  recognition  systems  employing 
element-by- element  registry  testing,  this  type  of  normalization  is  not  necessary 
because  the  target  is  framed  automatically. 


Many  objects  are  easily  recognized  if  they  are  reduced  to  a  line 
drawing  of  their  principal  contours.  Techniques  that  have  the  property  of 
eliminating  the  low  spatial  frequency  gray-scale  data,  and  producing  a  black 
and  white  reproduction  of  the  high  detail  areas  including  the  object  contours, 
are  useful.  Two  candidate  techniques  have  been  studied  in  detail  that  offer 
advantages  for  detail  detection:  the  Laplacian  of  the  image  brightness  and  the 
Gradient  of  the  image  brightness.  These  techniques,  combined  with  lineness 
editing  to  be  discussed  later,  offer  a  good  approximation  to  the  line  drawing 
function.  A  'best'  technique  would  serve  to  reproduce  all  regularly  curving 
brightness  transitions  regardless  of  contrast  ratio,  and  would  filter  out  all 
other  data.  As  yet,  no  such  technique  has  been  perfected, _ _  .<■ 


The  Laplacian  is  given  by  the  equation 


V2-  ^ 


<3-S) 
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where  B  is  the  image  brightness  arid  x  and  y  are  the  two  image  dimensions. 
Since  the  function  is  determined  by  brightness  derivative  rather  than  by 
brightness  itself,  it'  produces  no  output  on  large  ’uniform  areas.  In  the 
presence  of  line  boundaries  or  corners,  the  output  is  high.  Thus,  the  function 
tends  to  outline  objects  and  emphasize  their  contours.  It  can  be  simply 
quantized  to  two  levels  without  losing  outline  information.  Alternatively,  it 
can  be  rectified  by  obtaining  B  |  ,  and  then  quantizing  to  produce  a  passable 
line  drawing  of  gray-scale  image  material.  Laplacian  preprocessing  has  been 
used  successfully  in  the  Philco  Post  Office  Address  Reader.  It  is  identical  in 
function  to  the  two-dimensional  detail  filter  of  Taylor,  *  and  the  convexity 
detector  of  Harmon  and  Van  Bergijk.  ^  A  function  similar  to  the  Laplacian  has 
been  observed  in  the  frog's  retinal  ganglion  by  Lettvin  et.al. ,  ^  and  in  the 
lateral  geniculate  body  of  the  cat  by  Hubei  and  Wiesel.* 


The  Laplacian  function  cam  be  implemented  in  several  ways, 
both  optically  and  electronically.  Optical  techniques  use  coherent  optical 
spatial  filtering  or  a  two-channel  lensless  correlograph.  Electronic  techniques 
may  utilize  special  scanning  procedures  with  modulation  of  the  flying  spot 
focus  or  conventional  scanning  and  lumped  constant  delay  line  filters. 


The  Laplacian  function, or  spatial  second  derivative,  described  above 
is  a  scalar  quantity  derived  by  a  linear  process.  The  Gradient,  or  spatial 
first  derivative,  is  a  vector  quantity  which  indicates  the  magnitude  and  angle 
of  the  slope.  As  such,  it  cannot  be  derived  by  a  single  linear  process;  however, 
simple  electronic  techniques  can  be  implemented  that  derive  the  magnitude  of 
the  gradient.  This  function  serves  much  the  same  purpose  as  the  Laplacian, 
but  has  a  better  signal-to-noise  performance  (see  Appendix  A).  When  the  output 
is  to  be  thresholded,  it  is  often  easier  to  work  with  the  square  of  the  gradient 
magnitude,  with  an  output  then  of  the  form 


1.  Taylor,  "Pattern  Recognition  by  Means  of  Automatic  Analog  Apparatus,  " 
I.E.E. ,  March  1959. 

2.  L.  D.  Harman,  and  W.  A.  Van  Bergeijk,  "What  Good  are  Artificial  Neurons ,  " 
SvmpoBium  on  Bionics.  13-15  September  I960. 

3.  J.  Y.  Lettvin  et.al. ,  "What  the  Frog's  Eye  Tells  the  Frog's  Brain,  " 
Proceedings  of  the  IRE.  November  1959. 

4.  Hubei  and  Wiesel,  "Perceptive  Fields  of  Single  Naurons  in  the  Cat's 
Striate  Cortex.  "  Journal  of  Physiology.  1959- 
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The  angle  of  the  gradient  vector  also  may  be  determined  electronically 
by  measuring  the  horizontal  and  vertical  components  of  the  gradient  and  com¬ 
puting  the  arc  tan  of  the  ratio.  The  angular  measurements  in  a  local  area 
then  can  be  examined  to  determine  if  there  is  a  linear  boundary.  Such  a 
boundary  would  result  in  an  array  of  parallel  gradient  vectors.  This  would  be 
true  regardless  of  contrast  ratio. 

Lineness  Editing 

Gradient  magnitude  and  Laplacian  images  produce  acceptable  line 
drawings  on  some  input  material.  Certain  types  of  clutter,  however,  such  as 
forests,  produce  strong  outputs  in  the  absence  of  straight  lines.  If  desired, 
this  kind  of  clutter  can  be  eliminated  by  subjecting  a  two-level  clipped  Laplacian 
or  clipped  gradient- magnitude  image  to  a  subsequent  anisotropic  filtering 
operation.  A  long, thin  aperture  is  used  which  is  weighted  positively  in  the 
center  and  negatively  around  its  periphery,  very  much  like  litci  for  the  Laplacian 
except  for  an  elongation  in  one  dimension.  The  output  of  this  aperture  is  then 
presented  to  a  threshold;  many  such  threshold  outputs  for  apertures  it  eight  or 
more  angles  of  rotation  are  combined  in  an  OR  gate  to  produce  a  line  drawing 
output  from  which  clutter  has  been  edited.  Figure  3-3  summarises  the  method 
and  shows  an  electronic  implementation  of  the  technique. 

Silhouetting 

A  prenormalization  technique  resorted  by  Holmes,  Leland,  and  Richmond 
of  the  Cornell  Aeronautical  Laboratory,  I  attempts  to  isolate  the  object  from 
background  and  thus  form  a  silhouette  which  can  be  oriented  in  standard  position 
and  subsequently  categorized.  The  technique  works  well  with  objects  placed  on 
an  uncluttered  contrasting  background,  and  as  one  might  expect,  almost  not  at 
all  in  a  highly  cluttered  background  that  has  approximately  the  same  gray- scale 
value  as  the  object.  The  method  consists  of  computing  the  average  brightness 
over  a  square  "picture  frame"  aperture,  and  comparing  this  to  the  brightness 
of  the  point  at  the  center  of  the  square.  A  ONE  output  is  recorded  whenever 
the  difference  of  these  brightness  values  exceeds  a  threshold  whose  values  is 
proportional  to  the  standard  deviation  of  the  brightness  values  within  the  aperture. 
This  provides  a  form  of  automatic  gain  control  on  the  decision  depending  upon 
the  range  of  brightness  values  in  the  area.  The  output  from  this  prenormalisa- 
tion  operation  is  a  binary  image  in  which  discrete  objects  tend  to  be  silhouetted. 


1.  W.  S.  Holmes,  H.  R.  Leland,  G.  E.  Richmond,  "Design  of  a  Photo 
Interpretation  Automation,  "  Proceedings  of  the  1962  Fall  Computer 
Conference,  pp.  27-35. 
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Extraction  of  Additional  Information  from  Gray-Scale  Imagery 


While  the  detail  detection,  lineness  editing,  and  silhouetting  techniques 
are  useful  in  recording  fundamental  shape  information  about  the  image,  they 
may  ignore  much  pertinent  information  present  in  the  gray- scale  data.  These 
processes  make  use  of  the  high  spatial  frequency  content  of  the  retinal  data. 
The  lower  frequency  retinal  data  may  also  supply  much  information  about  the 
nature  of  the  decision  area;  for  instance,  oil  tanks  usually  appear  to  be  very 
bright  and  of  high  contrast  in  respect  to  their  surround.  Target  recognition 
devices  should  utilise  all  such  pertinent  information.  When  images  are  pre- 
processed  into  gradient  or  lineness  images  for  example,  it  is  important  that 
these  other  recognition  criteria  present  on  the  gray-scale  image  be  extracted 
for  later  input  to  the  recognition  logic  rather  than  be  discarded. 

3.  3 _ Intermediate  Processes  -  Feature  Extraction 


General 

The  function  of  the  parameter  measurement  unit  shown  in  Figure  3-2 
is  to  perform  a  number  of  intermediate  transformations  on  the  preprocessed 
retinal  data.  The  outputs  from  these  transformation  operations  are  a  new 
set  of  random  variables.  The  transformations  are  selected  to  have  two  primary 
properties:  first,  they  must  continue  to  preserve  the  target  information,  but, 
second,  must  present  that  information  in  terms  of  marginal  probabilities  and 
joint  probabilities  of  pair  states  so  that  the  resulting  discriminant  function  has 
a  simple  form  and  may  be  readily  computed.  The  problem  in  the  selection  of 
the  transformation  fun<  tions  is  that  no  comprehensive  analytical  technique  to 
do  this,  nor  any  adequ.  .te  statistical  description  of  imagery  from  which  analysis 
might  be  derived,  has  'et  been  developed.  Consequently,  the  selection  of  good 
transforms  may  begin  with  a  list  of  candidate  transforms  selected  on  the  basis 
of  some  best  estimate  of  utility.  The  transformations  may  relate  to  basic 
characteristic  "features”  of  the  imagery,  or  they  may  be  selected  according  to 
some  arbitrary  rule  or  random  process.  After  compiling  the  list,  the  problem 
is  to  select  the  good  transforms.  For  this  purpose,  some  analytical  tools  are 
available.  These  will  be  described  in  Section  6.  The  remainder  of  this  section 
deals  with  various  general  classes  of  transform  logics,  and  how  they  might  be 
applied  to  the  problem. 

There  are  two  approaches  to  the  design  of  features  for  incorporation 
in  a  target  recognition  system.  The  first  is  to  design  a  universal  vocabulary 
of  features  which  will  h  ,  general1-/  applicable  regardless  of  what  target  classes 
are  involved.  The  advantages  are  economy  of  implementation  and  flexibility  of 
application.  If  new  target  classes  are,  added,  the  only  additional  circuitry 
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required  is  a  final  decision  layer  operating  on  the  feature  recognition  logic 
outputs.  As  yet,  no  adequate  set  of  universal  features  has  been  developed. 
Based  on  subjective  analysis  of  typical  imagery,  certain  basic  features  have 
been  suggested,  such  as  straight  edges,  circles,  and  other  simple  geometrical 
and  textural  prope  rties.  Detailed  specification  of  optimum  logics  to  extract  such 
features  has  not  yet  been  developed.  The  use  of  "random"  masks  represents 
another  attempt  at  the  generation  of  a  universal  vocabulary  of  "features, " 
based  on  zoomorphic  analogies  (modeling  of  animal  processes) . 

The  second  approach  to  feature  design  is  to  provide  a  specific  set  of 
features  for  each  pattern  class  to  be  recognized.  Statistical  techniques  can  be 
applied  effectively  in  this  application.  There  are  many  ways  in  which  these 
techniques  may  be  used.  One  example  is  as  follows: 

The  target  decision  area  is  divided  into  a  number  of  feature  areas . 

In  each  area,  the  coefficients  of  a  linear  or  quadratic  discriminant  of 

the  form  given  in  Equation  3-7  may  be  computed  from  sample  values  of  the 
input  elements  for  a  number v^>f  images  representing  the  various  pattern  clas¬ 
sifications  .  The  technique  is  detailed  in  a  later  section.  The  various  methods 
of  calculating  coeffieients,  amenable  to  programming  on  a  digital  computer, 
are  outlined  later  and  detailed  in  Section  6. 

A  difficult  problem  in  visual  pattern  recognition  is  attaining  feature 
detection  which  provides  adequate  tolerance  to  local  translations  and/or  size 
and  shape  variations  of  patterns  within  a  single  class.  Possible  solutions  to 
this  problem  include  the  following. 

1 .  Local  area  threshold  logic  networks  with  broad  tolerance  to  the 
effects  of  feature  translation.  However,  sufficient  tolerance 

is  not  always  attainable. 

2.  Use  of  the  approach  of  Liti  and  Kamentsky  *  in  the  process  of 
local  area  registry  testing. 

3.  Duplicate  the  feature  detection  logic  in  over-lapping  fashion 
to  insure  feature  detection  in  the  presence  of  translation. 
Alternative  networks  to  implement  this  scheme  are  illustrated 

in  Figure  3-4,  "Techniques  for  the  Duplication  of  Feature  Logic.  " 


1.  L.  A.  Kamentsky,  and  C.  N.  Liu,  "Computer-Automated  Design  of  Multi  ¬ 
font  Print  Recognition  Logic,  "  IBM  Journal  of  Research  and  Development 
Voi.  7,  No.  1,  Jan.  1963,  pp.  2-13. 
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■>)  IDENTICAL  FEATURE  DETECTORS  DISPLACED  IN  REGISTRY 
OVER  SMALL  LOCAL  AREA  OF  IMAGE  WHERE  FEATURE  IS 
ANTICIPATED. 


-  -THRESHOLDS 


<b)  FEATURE  DETECTORS  DISPLACED  REv  STRY  AS  ABOVE 
BUT  COMBINED  IN  OR  0ATE  BEFORE  USE  .  .  THE  DISCRIM¬ 
INANT  FUNCTION. 

Figure  3-4  Techniques  for  the  Duplication  of  Feature  L  >ic 


The  basic  template  matching  technique  uses  a  weighted  linear  sum¬ 
mation  of  the  preprocessed  retinal  data  as  described  by 
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where  Wy  is  the  weighting  function  and  r  (x,  y)  represents  the  preprocessed 
retinal  brightness  data.  This  transformation  is  the  result  at  the  preprocessing 
operations  described  in  Section  3.  2.  The  output  is  given  by 


f  (x,  y) 
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1 


-1 


if  f(x.  y)  >  6 

iff(x,y)<  • 


where  0  is  a  previously  established  threshold. 


For  recognition  of  simple  geometric  "features"  such  as  straight  edges, 
right  angles ,  or  circles ,  the  weights  may  be  chosen  arbitrarily  to  match  a 
particular  geometric  configuration.  For  example,  the  recognition  of  horixcotal 
straight  edges  could  be  based  on  a  linear  template  applied  to  thresholded  gradient 
magnitude  data,  where  the  template  has  the  following  form. 
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Then 


^horiz  (*•  y)  =  w 
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where  fhoriz  (*>  y)  can  range  in  values  from  +40  w  to  -40  w.  An  area  of  uniform 
brightness,  or  a  contrast  edge  not  approximately  parallel  to  the  straight  line 
mask  will  result  in  fhoriz  (a,  y)  =  0.  Therefore,  for  a  threshold  well  above  zero 
but  somewhat  less  than  40,  25  for  example,  this  template  will  be  a  reasonably 
good  straight  edge  segment  detector.  The  output  from  such  a  feature  template 
matching  unit  f(x,  y)  will  be  an  array  of  plus  ones  (corresponding  to  a  detection) 
and  minus  ones,  (no  detection)  covering  every  position  (x,  y)  in  which  the  template 
was  registered. 


The  output  from  any  location  (zj,  y j)  will  be  a  particular  sample  of  a 
random  variable  f(xj ,  yj)  having  a  high  probability  that  f(xj,  yj)  =  1  if  an  edge 
of  a  man-made  object  is  present  there,  and  - 1  if  it  is  natural  terrain.  The 
spatial  distribution  of  f(x,  y)  over  the  decision  area  will  be  effective  in  determining 
whether  a  specific  class  is  present. 


This  template  matching  logic  is  identical  in  structure  to  the  A -units  of 
the  Perceptron.  The  usual  procedure  is  to  generate  an  output  from  each  A  unit 
for  only  one  position  of  registry  within  the  decision  area,  but  this  is  not  a 
fundamental  limitation.  That  is,  each  random  mask  can  be  matched  to  the  pre- 
processed  retinal  data  in  many  (or  every)  position  of  registry.  The  principal 
characteristic  of  the  A-units  in  Perceptron  application  is  that  the  weights  and 
connections  to  the  retinal  element  are  not  designed  to  correlate  to  "features" 
but  are  chosen  according  to  some  random  process.  This  process  can  set 
various  constraints,  -uch  as  "choose  points  that  are  contiguous  along  a  curving 
line, "  and  still  represent  a  random  selection  procedure.  The  random  variables, 
xj  =  +1  or  1,  outputs  from  the  A-units,  having  no  overt  functional  or  semantic 
relationship  to  the  input  data,  are  presumed  to  be  uncorrelated,  or  at  least 
correlated  similarly  (equal  covariance  matrices)  in  the  signal  plus  noise  (target) 
and  pure  noise  (no  target)  cases.  Under  these  assumptions,  the  optimum 
decision  logic  is  a  linear  discriminant  function  of  the  form  £  aj  xj  whose  coef¬ 
ficients  may  be  calculated  from  the  responses  of  the  masks  to  a  sample  set  of 
images.  The  determination  of  the  weighting  functions,  aj,  whether  carried  out 
by  statistical  or  adaptive  procedures,  provides  a  selection  process  for  the 
random  masks.  Those  that  respond  fairly  uniformly  to  both  the  targets  and 
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non-targets  in  the  training  sample  will  have  very  low  weights  and  will  be 
effectively  eliminated.  The  masks  that  respond  most  often  to  only  one  of 
the  two  classes  in  the  sample  set  will  have  high  weights  and  will  be  retained. 
The  principal  limitation  of  these  masks  is  that  the  amount  of  information 
contributed  by  any  relatively  small  set  of  them  may  be  inadequate  for  good 
target  recognition,  and  further,  that  as  additional  A-units  are  used,  the 
cross- correlation  coefficients  may  be  such  that  they  contribute  no  new  in¬ 
formation  to  a  decision  based  On  a  linear  discriminant  function. 

The  spatial  filtering  operation  described  above  can  be  applied  either 
in  the  spatial  frequency  domain  by  multiplying  the  signal  transforms  using  a 
spatial  frequency  filter,  or  in  the  space  domain,  through  a  cross-correlation 
of  a  weighting  function  with  the  transformed  or  raw  retinal  data  in  every 
position  of  alignment  (see  Appendix  B).  Selection  of  a  preferred  technique 
can  be  made  solely  on  the  basis  of  feasibility  and  cost  of  implementation. 

There  are  several  equivalent  hardware  implementations  for  such  a  filtering 
operation.  Optical  techniques,  e.  g. ,  lensless  correlographs  and  coherent 
light  optical  spatial  filters,  provide  parallel  access  to  an  area  of  the  input 
image  and  can  perform  the  cross- correlation  of  a  single  weighting  function 
simultaneously  over  all  possible  alignments.  Electronic  techniques  based 
on  scanning  of  the  input  image  provide  sequential  access  to  the  points  of  the 
image,  and  must  be  used  in  conjurv.Hon  with  a  dynamic  memory  Such  as  a 
delay  line  or  shift  register.  With  such  a  memory  many  weighting  functions 
can  simultaneously  be  croes-correlated  at  any  given  moment  in  &  single 
relative  alignment  with  the  image,  with  all  possible  alignments  or  position 
of  registry  of  the  image  being  accomplished  time  sequentially. 

Digital  Design  Techniques 

The  preceding  discussion  has  emphasized  the  use  of  threshold  logic 
to  detect  individual  features  to  be  combined  for  a  subsequent  decislpn.  Thres¬ 
hold  logic  is  attractive  in  pattern  recognition  because  It  can  interpolate  between 
samples  where  only  one  or  two  bits  or  picture  elements  may  be  different 
between  samples.  Conventional  parallel  digital  logic,  however,  is  often  easier 
to  implement  and  service.  With  digital  logic,  once  a  satisfactory  design  has 
been  achieved,  there  are  no  problems  of  threshold  drift  and  adjustment. 

On  the  other  hand,  digital  logic  does  not  have  any  inherent  ability  to 
interpolate  from  known  samples  to  unknown  or  unspecified  samples.  Digital 
logic  works  only  to  the  extent  that  the  designer  or  design  algorithm  is  able  to 
foresee  and  provide  for  all  possible  logical  alternatives. 

Little  progress  has  been  made  over  the  last  few  years  in  designing 
general-purpose  algorithms  to  produce  digital  logic  which  extrapolates  from  a 
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training  sequence  to  provide  for  unknowns.  Carne,  working  with  the  Artron 
at  Melpar,!  and  Power#  and  Vernot  at  Philco,  ^  have  attempted  to  design 
general-purpose  adaptive  logic  for  pattern  recognition  application.  In  each 
case,  the  result  has  been  a  device  which  can  memorize  a  training  sequence 
perfectly,  but  is  unable  to  do  much  better  than  chance  on  unknowns.  Algorithms 
have  been  designed  successfully  at  Philco  to  handle  digital  design  for  specific 
problems,  such  as  recognition  of  hand-print  alphanumeric  characters,  but 
each  new  problem  has  required  a  new  algorithm. 


One  approach  to  digital  design  specifies  the  problem  by  the  use  of  a 
truth  table  (see  Figure  3-5).  Once  the  truth  table  is  specified,  it  is  a  relatively 
simple  matter  to  implement  it  with  digital  logic.  The  chief  problem  with  this 
technique  is  that  if  N  digital  inputs  are  available,  a  truth  table  has  2**  entries 
which  must  be  filled  to  specify  the  problem.  Usually  an  insufficient  training 
sample  is  available  to  fill  all  entries. 


Decision  tree  techniques  for  pattern  recognition  are  well  known. 
Basically,  a  decision  tree  consists  of  a  series  of  dichotomous  decisions  placed 
on  a  sorting  path  as  shown  in  Figure  3 -5b-.  At  each  decision,  the  path  forks 
into  two  alternatives.  A  sample  for  test  enters  the  tree  at  the  top,  and  after 
passing  a  series  of  tests  ends  up  at  a  terminal  classification.  Each  junction 
usually  represents  the  testing  of  a  particular  bit  in  an  input  code.  A  well 
designed  sequential  decision  tree  tests  only  those  bits  necessary  for  clas¬ 
sification;  the  remaining  bits  are  ignored.  A  decision  tree,  however,  need 
not  represent  a  sequential  process.  All  tests  may  be  performed  simultaneously 
and  their  outputs  combined  by  parallel  logic  to  achieve  the  same  terminal  clas¬ 
sification.  Of  course,  all  tests  may  not  be  used  in  every  classification. 
Feigenbaum  and  Simon  3>  4  have  programed  a  general-purpose  algorithm  for 
the  design  of  a  decision  tree  from  a  training  sequence.  The  routine  is  part  of 
a  program  called  EPAM  (Elementary  Perceiver  and  Memorizer).  The  EPAM 
program  has  been  used  thus  far  to  associate  written  and  verbal  representations 


E.  B.  Carne,  E.  M.  Connolly,  P.  H.  Halpern,  a^d  B.  A.  Logan,  "  A  Self- 
Organizing  Binary  Logical  Element, "  Biological  Prototypes  and  Synthetic 
Systems .  pp.  >11-330,  Plenum  Press,  N.  Y. 

R.  D.  Yernot  and  E.  N.  Powers,  "A  Tunnel-Diode  Adaptive  Logic  Net,  " 
Proceedings  of  the  1962  International  Solid  State  Conference. 


E.  A.  Figenbaum,  and  H.  A.  Simon,  "Performance  of  a  Reading  Task  by  an 
Elementary  Perceiving  and  Memorizing  Program,  "  RAND  Renort.  p.  2358. 

E.  A.  Figenbaum,  "The  Simulation  of  Verbal  Learning  Behavior,  " 
Proceedings  of  the  Western  Joint  Computer  Conference.  Yol.  19,  1961, 
pp.  121-132. 
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(a)  Example  of  a  classification  problem  expressed  by  a  three  variable  truth 
table  in  which  classification  for  all  states  is  specified  by  an  exhaustive 
training  sequence  for  three  input  variables,  X|,  Xg,  and  X) 


SAMPLE  in 


(b)  Decision  tree  to  classify  truth  table  entries  designed  by  Feigenba urn's 
algorithm.  Tree  discloses  that  input  variable  X2  is  not  needed  for 
classification.  Logical  relation  expressed  by  decision  tree 


I  =  Xj  •  Xj  +Xj  •  X3 

11  *  x  •  X  +  S  -x 

13  13 

where  indicates  AND 

♦  indicates  OR 
and  —  indicates  NOT 


CLASSIFICATION 

OUTPUT 

0  PON  CLASS  X 

1  roo  class  a 


(c)  Resulting  logical  design  using  parallel  logic 

Figure  3-5  Digital  Design  Approach  to  Feature  Extraction 


3-23 


of  nonsense  syllables.  EPAM  has  been  observed  to  make  mistakes  similar  to 
those  made  by  humans  on  similar  problems.  No  data  are  available  however 
on  its  ability  to  generalize  to  unknown  samples. 

Some  investigators  have  avoided  the  problem  of  adaptive  digital  design 
by  means  of  constrained  random  generation  of  digital  combinations  of  binary 
input  variables.  The  first  constraint  chosen  is  the  number  of  input  variables 
to  be  considered  at  any  one  time  in  logical  combinations.  Thus,  for  a  particular 
number  choice,  n,  a  series  of  n- tuples  are  generated.  When  n  equals  2,  the 
logical  states  of  binary  picture- element  pairs  are  considered,  when  n  equals  3, 
triplets,  and  so  forth. 

Once  the  n-tuples  are  generated,  their  digital  outputs  are  tabulated 
over  a  training  sequence  and  weighted  by  some  rule  (usually  statistical  in  nature) 
to  be  combined  in  a  subsequent  decision.  These  techniques  are  close  relatives 
of  perceptron-like  devices  and  to  the  kind  of  threshold  feature  detectors  con¬ 
sidered  in  this  report,  except  that  the  randomly  chosen  n-picture  elements 
are  combined  digitally  rather  than  by  threshold  logic.  If  a  separate  weight 
set  and  threshold  were  provided  for  each  n- tuple  state  recorded,  if  all  weights 
were  set  to  plus  or  minus  ones,  and  if  the  thresholds  were  set  to  the  highest 
possible  value,  then  the  two  techniques  became  identical. 

1  2 

Browning  and  Bledsoe  *  experimented  with  n-tuples,  with  n  having 
values  as  high  as  six.  In  these  works,  all  states  of  the  n  -tuple  were  recorded 
and  used;  thus,  for  n  =  6,  ?.  or  64  states  were  recorded  for  each  n- tuple. 

Simple  Bayes'  weighting  was  used  and  the  results  were  excellent  for  machine 
print  and  encouraging  for  handprint.  Uhr  and 'Vossler  ^  have  designed  a  program 
which  records  only  one  state  each  of  many  25-tuples.  The  program  incorporates 
rules  for  discarding  the  least  useful  digital  combinations  and  generating  new 
ones.  This  program  has  demonstrated  some  success  in  recognising  cartoon 
characters  and  recognizing  speech. in  binary  sonogram  representation. 

1.  W.  W.  Bledsoe,  and  C.  L.  Bison,  Improved  Memory  Matrices  for  the 

n- Tuple  Pattern  Recognition  internal  report.  Panoramic  Research, 

Inc. 

2.  W.  W.  Bledsoe,  and  I.  Browning,  "Pattern  Recognition  and  Reading  by 
Machine,  "  Proceeding,  of  the  Eastern  Joint  Computer  Conf-  1959. 

L.  Uhr,  and  C.  Vossler,  "A  Pattern  Recognition  Program  That  Generates, 
Evaluates,  and  Adjusts  its  Own  Operators.  "  Proc.  Western  Joint  Computer 
Conf.,  1961,  pp.  555-569. 
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Promising  work  in  n-tuples  has  recently  been  published  by  Kamentsky 
and  Liu,  *  who  have  expanded  on  the  n-tuple  concept  in  multiple -font  print¬ 
reading  application  The  n-tuples  of  Kamentsky  and  Liu  differ  from  thoie  of 
Browning  and  Bledsoe  in  that  registry  testing  and  memory  have  been  added 
to  the  n-bit  states-  The  Kamentsky  n-tuple  is  a  constrained  random  choice  of 
seven  binary  picture  elements  in  the  input  image-  These  seven  elements  are 
given  ZERO  and  ONE  designations  and  a  ONE  output  is  recorded  for  the  n-tuple 
if  all  seven  ZERO- ONE  input  states  are  satisfied  simultaneously  in  any  position 
of  translational  registry  over  the  input  character.  This  registry  testing  aspect 
is  quite  important  since  it  increases  the  probability  of  a  ONE  response  by  an 
order  of  magnitude,  thus  increasing  the  power  of  the  n-tuple  to  transmit 
information.  The  choice  of  the  7 -tuple  over  other  numbers  was  based  on 
experiments  which  determined  the  information  content  of  n-tuples  for  various 
values  of  n.  The  results  of  Kamentsky  and  Liu  are  sufficiently  impressive  for 
the  alphanumeric  problem  to  merit  further  investigation  of  n-tuples  in  object 
recognition  applications .  Points  which  need  further  investigation  are: 

1.  Relative  efficiency  of  logical  n-tuples  versus  threshold  n-tuples. 

The  logical  AND  used  by  Kamentsky  to  combine  the  elements  of 
his  n-tuples  is  equivalent  to  a  simple  linear  summation  of  the 
elements  with  a  threshold  on  the  output  set  to  fire  only  when  all 

of  the  elements  are  active.  Therefore,  this  point  of  investigation, 
in  more  general  terms ,  is  to  determine  how  the  efficiency  of  a 
threshold  n-tuple  varies  as  a  function  of  its  threshold. 

2.  Optimum  constraints  for  n-tuple  generation. 

3.  Efficiency  of  n-tuples  versus  the  amount  of  registry  testing  used. 
Local  area  registry  testing  of  a  digital  n-tuple  increases  the 
probability  of  a  ONE  response  and  so  increases  its  ability  to 
transmit  information.  On  the  other  hand,  further  registry 
testing  over  a  wider  area  causes  information  loss,  to  the  extent 
that  relative  feature  position  may  be  important  to  recognition. 

The  relative  trade-offs  of  these  considerations  need  to  be 
investigated. 

Most  logical  processes  can  in  theory,  be  represented  by  parallel  logic 
networks.  However,  as  a  practical  matter,  many  useful  recognition  routines 
are  impractical  to  implement  with  parallel  logic  because  of  the  excessive  number 


1.  L.  A.  Kamentsky,  and  C.  N.  Liu,  "Computer-Automated  Design  of  Multi¬ 
font  Print  Recognition  Logic,  "  IBM  Journal  of  Research  and  Development. 
Vol.  7,  No.  1,  Jan.  1963,  pp.  2-13. 
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of  components  required.  Complex  image  processing  techniques  often  can  be 
implemented  most  economically  by  some  form  of  sequential  logic.  Much  of 
the  initial  work  in  pattern  recognition  used  sequential  logic.  2  Routines  were 
programed  far  the  general-purpose  digital  computer  which  smoothed  edges, 
eliminated  clutter,  and  traced  contours  to  extract  major  geometric  features 
from  alphanumeric  characters.  Work  at  Cornell  Aeronautical  Laboratories  ~ 
in  the  isolation  of  silhouettes  from  gray-scale  imagery  is  an  example  of 
sequential  routines  applied  to  gray- scale  object  recognition.  Sequential 
routines,  of  course,  are  relatively  slow.  Parallel  and  sequential  techniques 
may  be  combined  as  a  compromise  between  speed  and  equipment  complexity. 

For  example,  it  may  be  desirable  to  obtain  tentative  identifications  of  objects 
using  parallel  logic,  and  subsequently  to  verify  this  identification  with  a  more 
sophisticated  sequential  logic  operation.  Contour  following  is  an  important 
example  of  asequential processing  technique.  There  is  little  doubt  that  human 
eye  motions  follow  contours  in  tracing  paths  through  clutter.  ^  Contour  following 
may  be  an  efficient  technique  for  final  recognition  after  the  number  of  candidate 
patterns  has  been  reduced  by  high-speed  parallel  preprocessing. 

Tranalationally  Invariant  Feature  Extraction  Techniques 

Much  attention  in  the  pattern  recognition  art  has  been  given  to  the  role 
of  two  features  of  the  image  which  are  invariant  to  translation  of  the  input. 

These  are  the  Fourier  energy  spectrum  and  the  autocorrelation  function.  These 
two  are  very  closely  related,  the  first  being  the  Fourier  transform  of  the  latter. 
They  are  similar  also  in  that  they  destroy  the  same  information  in  eliminating 
the  effects  of  translation.  The  following  discussion  is  concerned  with  the 
Fourier  energy  spectrum  only.  It  should  be  understood,  however,  that  com¬ 
pletely  analogous  statements  can  be  made  about  the  autocorrelation  function 
because  of  the  transform  relationship  existing  between  the  two  functions . 


1.  J.  S.  Bomba,  "Alpha-Numeric  Character  Recognition  Using  Local  Operations,' 
Proceedings  of  the  Eastern  Joint  Computer  Conference.  1959. 

2.  S.  H.  Unger,  "Pattern  Detection  and  Recognition, 11  Proc.  of  the  IRE. 

October  1959. 

3.  W.  S.  Holmes,  H.  R.  Leland,  G;-E.  Richmond,  "Design  of  a  Photo 
Interpretation  Automation,"  Proceedings  of  the  1962  Fall  Computer 
Conference,  pp.  27-35. 

4.  J.  R.  Platt,  "How  We  See  Straight  Lines,  "  Scientific  American.  Vol.  22, 

No.  6,  pp.  121-129,  I960. 
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The  Two-Dimensional  Fourier  Transform* 


The  expression 


RK-  V 


-j(wx  x  +  «y 


y) 

dx  dy 


(3-12) 


retains  all  of  the  information  of  the  input  image .  From  the  transform  output 
RK.  Wy) ,  one  can,  using  the  inverse  transform,  reconstruct  the  original 
image  r(x,y)  with  no  loss  of  information,  including  the  exact  position  of  the 
image  with  respect  to  the  origin.  The  image  data  are  contained  in  both  the 
amplitude  and  the  phase  of  the  transform.  To  obtain  an  output  which  is  not  a 
functior  of  input  translation,  something  must  be  discarded.  The  information 
of  image  location  with  respect  to  the  origin  is  contained  exclusively  in  the  phase 
data.  Thus,  when  the  Fourier  energy  spectrum  R('wx ,  uy)  •  R  (  wx,  «y)  is 
obtained,  the  output  is  independent  of  input  image  translation.  Unfortunately , 
more  is  lost  than  just  total  image  translation.  This  single  parameter  might, 
for  example,  be  specified  by  12  bits:  six  bits  in  x  and  six  in  y.  However,  in 
discarding  the  phase  data,  about  half  of  the  information  in  the  transform  is  lost; 
in  a  picture  involving  64  x  64  elements  resolution,  3-bits  each,  this  might 
involve  as  much  as  12, 000  bits.  The  power  spectrum  no  longer  specifies  the 
image  completely,  and  it  is  impossible  to  reconstruct  the  image  from  the  power 
spectrum  alone.  In  fact,  more  than  one  image  might  well  have  the  same  power 
spectrum.  Figure  3-6  shows  an  example  of  two  1 -dimensional  functions  which 
have  the  same  power  spectrum. 

Figure  3-7  shows  a  one -dimensional  Fourier  spectrum  plot  for  a  tank 
in  a  cluttered  background.  Here  the  spectrum  plot  at  each  vertical  position 
corresponds  to  the  spectrum  of  a  corresponding  narrow  horisontal  strip  taken 
across  the  picture  at  the  same  height.  Sharp  peaks  can  be  observed  by  the 
fundamental  frequency  and  harmonics  of  the  periodic  bogie  suspension.  Also, 
a  single  peak  can  be  observed  at  the  height  of  the  upper  tread,  corresponding 
to  the  frequency  of  the  tread  links.  Figure  3-8  shows  two-dimensional  Fourier 
energy  spectra  of  a  white  rectangle  on  a  black  field,  and  a  tank  photograph.  The 
spectrum  of  the  rectangular  aperture  is  very  nearly  ideal  with  the  complete 
bilateral  symmetry  which  is  to  be  expected  with  a  pure,  real  input  function 
R(u)  =R(-w).  In  the  case  of  the  tank,  however,  the  spectrum  lacks  the  expected 
symmetry,  indicating  that  variable  emulsion  thickness  of  the  input  transparency 
has  caused  variable  phase  shift  even  with  the  negative  immersed  between  optical 
flats  in  an  oil  having  approximately  the  same  index  of  refraction.  The  photo¬ 
graphs  shown  in  Figures  3-7  and  3-8  were  produced  in  the  Philco  laboratories 
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Figure  3-6  Two  One -Dimensional  Functions  Having  Identical  Fourier 
Power  Spectra 
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Figure  3-7  One -Dimensional  Spectrum  of  Tank  and  Background  Scene 


3-29 


(a)  Spectrum  of  a  Rectangular  Aperture 
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Figure 


(b)  Spectrum  of  rank  r'hoto  (showing  Film  Thickness  Effects) 


3-8  Two-Dimensional  Spectra  Obtained  with  Mirror  Spectrometer 
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in  connection  with  another  contract.  Figures  3-9  and  3-10  show  the  optical 
set-ups  used.  Many  more  photographs  would  be  required  to  determine 
whether  or  not  Fourier  spectra  contain  sufficient  data  for  dependable  recogni¬ 
tion  of  various  classes  of  objects. 

The  problem  of  large  area  texture  identification  appears  to  be  well 
adapted  to  the  use  of  Fourier  power  series  techniques.  Since  no  geometric 
relationships  need  be  preserved  in  the  application,  the  discarded  phase  in¬ 
formation  is  of  lesser  importance. 

If  either  the  Fourier  power  spectrum  or  the  autocorrelation  function 
is  to  be  useful  to  eliminate  translation,  the  total  frame  size  over  which  the 
spectrum  is  evaluated  must  be  larger  than  the  object  to  be  recognized.  If, 
for  example,  they  were  the  same  size,  the  object  would  be  already  centered 
and  there  would  be  no  need  for  a  trauslationally  invariant  processing  tech¬ 
nique.  Unfortunately,  as  the  f.ame  size  is  expanded  to  include  more  than  the 
target,  clutter  in  the  target  backg  round  adds  to  and  dilutes  the  target  spec¬ 
trum  making  recognition  more  difficult. 

Textural  features  are  recognized  by  applying  linear  discriminant 
functions  to  the  components  of  the  Fourier  power  spectrum.  The  output 
for  the  textural  feature  Tj,  is  the  weighted  summation  of  all  the  components 
of  the  power  spectrum.  The  component  weights  can  be  derived  from  sample 
data  in  the  same  manner  that  the  coefficients  of  any  linear  discriminant 
function.  In  Appendix  C,  it  is  demonstrated  that  this  is  completely  equivalent 
to  cross- correlating  the  input  material,  h(x,  y)  with  each  of  two  weighting 
functions  Wj(x,  y)  and  wg(x,  y),  squaring  the  outputs,  and  integrating  the  dif¬ 
ference  over  the  frame. 


^(x,  y) 


r(x  -  4  ,  y  -  tj)  w1(|,  tj)  d |  d ij 


y) 


r(x  -  4  ,  y  -  73)  w2  (4,  tj)  d4  d»j 


(3-13) 


(3-14) 
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dx  dy  . 
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Figure  3-9  One -Dimensional  Multichannel  Spatial  Spectrometer 


Figure  J-1Q  Spatial  Spectr  smeta  r 


To  make  a  texture  measurement  in  a  small  local  area,  it  is  only 
necessary  to  reduce  the  area  of  frame.  The  resulting  output  is  no  longer  a 
rigorous  equivalent  of  the  weighted  Fourier  power  spectrum  measurement, 
however,  the  loss  of  rigor  is  no  worse  than  that  associated  with  any  other 
method  which  measures  Fourier  spectrum  over  a  small  area. 

The  textural  measurements,  then,  can  be  made  either  electronically 
or  optically  by  the  same  basic  techniques  used  for  geometric  feature  extraction 
but  with  the  addition  of  a  squaring  operation  and  an  integration  or  summation 
over  the  frame. 

3.4  The  Role  of  Statistical  Decision  Theory 

The  intelligent  application  of  techniques  derived  from  statistical 
decision  theory  is  basic  to  the  solution  of  the  imagery  screening  problem. 
Specific  applications  have  been  touched  upon  briefly  in  earlier  portions  of 
Section  3. 

Statistical  decision  theory  is  applicable  to  the  following  type  of 
problems . 


•  Unknown  samples  are  drawn  from  a  universe  made  up  of 
several  classes. 

•A  set  of  measurements  are  made  of  properties  of  each 
unknown  sample.  Based  on  these  measurements,  the  sample 
is  to  be  classified. 

•  For  each  class  in  the  universe,  the  multivariate  probability 
distribution  of  these  property  measurements  is  known,  or  can 
be  estimated  from  a  set  of  known  samples. 

•  The  "costs"  of  mis  classifying  a  sample  of  a  certain  class 
into  each  of  the  other  classes  is  known  or  estimated. 

•  The  distribution  of  each  class  within  the  universe  is  known 
or  estimated. 

Under  these  circumstances,  statistical  decision  theory  can  provide 
useful  rules  for  classifying  the  unknown  samples.  The  specific  rules  to  be 
used  will  depend  on  the  nature  of  the  multivariate  distributions  of  the  property 
measurements  and  the  completeness  of  the  description  of  the  universe.  Tech¬ 
niques  available  fall  into  two  general  classes:  parametric,  in  which  the  para¬ 
meters  of  the  distributions  are  known  or  assumed;  and  nonpar ametric,  in  which 


*  e.g. ,  the  moments  of  the  distributions 
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no  assumption  is  made  about  the  distributions.  The  principal  techniques  are 
described  in  Section  6  of  this  report.  The  usefulness  of  any  given  technique 
will  depend  on  the  specific  application. 

In  the  processing  of  gray- scale  imagery  for  target  detection,  sta¬ 
tistical  techniques  may  be  applied  in  a  number  of  ways.  Each  element  of  the 
retinal  field  of  gray- scale  or  two-level  edge  detected  values  constitutes  a 
random  variable.  If  complete  statistics  were  known  describing  the  multi¬ 
variate  distribution  of  the  retinal  element  values  for  each  target  class  and 
for  all  non-target  images,  then,  at  least  theoretically  if  not  practically, 
target  detection  could  be  pe  rf ormed  optimally  by  a  single  layer  of  logic  operating 
on  the  retinal  data.  The  linear  "matched  filter"  or  template  matching  logic 
described  earlier  is  an  extremely  simple  single-layer  logic,  based  on  very 
restrictive  assumptions  about  the  distribution  of  the  retinal  element  data. 

In  multilayer  systems,  the  output  of  the  feature  detector  constitutes 
a  set  of  random  variables.  Because  of  the  practical  problems  of  implementa¬ 
tion,  useful  features  will  be  those  that  present  the  information  necessary  for 
classification  in  terms  of  much  simpler  statistical  distributions.  In  Section  7 
of  this  report,  an  experimental  program  is  described  for  investigating  the 
application  of  statistical  techniques  operating  first  on  the  retinal  layer  to 
extract  features,  and  then  on  the  feature  detector  outputs  to  make  the  final 
classification. 


SECTION  4 


SYSTEM  IMPLEMENTATION  TECHNIQUES 
4. 1 _ General  • 

There  are  two  general  approaches  to  the  implementation  of  an  image 
screening  system.  The  first,  based  principally  on  optical  techniques,  provides 
parallel  access  to  a  large  area  of  the  input  imagery  for  cross -correlation  in 
all  positions  of  registry  with  one  weighting  function  or  template  at  a  time.  The 
second  approach,  making  use  of  image  scanning  and  delay  line  or  shift  regis¬ 
ter  cross -correlators,  uses  sequential  access  to  positions  on  the  retina,  but 
provides  for  simultaneous  cross-correlation  of  many  templates. 

The  investigation  and  comparison  of  the  sequential  versus  the  parallel 
approach  is  still  in  progress.  The  comparison  must  necessarily  be  a  detailed 
one  since  the  investigation  must  consider  the  design  details  and  the  hardware 
implementation  involved.  The  present  material,  therefore,  reports  the  cur¬ 
rent  status  of  a  continuing  program  and  is  in  no  sense  a  final  evaluation. 

_4._2 _ Parallel  -Ac  .ess  Sequential-Masking 

Introduction 

Two  principal  techniques  --  the  lensless  correlograph  and  coherent 
optical  spatial  filtering  --  provide-  for  simultaneous  correlation  of  a  single 
template  or  filter  function  in  every  position  of  registration  with  the  input 
image.  Both  have  been  described  in  detail  in  the  literature*  as  single-layer, 
linear  template -matching  devices.  Their  effectiveness  in  multilayered 
systems  is  presently  limited  by  the  available  sensors. 

The  Lensless  Correlograph 

The  basic  operation  of  the  lensless  correlograph  (Figure  4- la 
and  b)  can  be  understood  in  terms  of  a  generalized  pinhole  camera,  with  the 
pinhole  replaced  by  a  shaped,  weighted  aperture.  The  weights  are  achieved 
by  controlling  the  transparency  at  each  point  of  the  aperture  in  accordance 
with  weighting  function.  The  aperture  may  be  a  continuous  function 
but  the  operation  of  the  correlograph  may  be  best  understood  in  terms  of 


See  bibliography  in  First  Quarterly  Report. 
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Figure  4-l(b)  Lensless  Correlograph  -  Shaped  Aperture 


discrete  cells,  or  picture  elements.  For  a  single  cell,  the  operation  is  identi¬ 
cal  to  that  of  a  pinhole  camera.  Each  point  of  the  input  photograph,  b(x,y)  in 
the  object  plane  is  imaged  at  a  point,  ({,  17)  on  the  image  plane  that  is  comple¬ 
mentary  to  it  with  respect  to  the  aperture  cell  at  (u,  v).  For  a  correlograph 
with  unity  magnification,  as  in  Figure  4-  la,  the  elementary  output  fe  ({ ,  tj)  on 
the  image  plane  is  defined  as 

(i-n)  =  b(*>y)w  |  *  ~  x  »  **  2  ^  )  <4-1) 


or 


fe  U.ij)  =  -  2u,  ij  -  2 v)  w(u,v) 


(4-2) 


where 


w(u,  v)  is  the  weighting  value  at  point  (u,  v)  . 


For  the  case  of  a  shaped  weighted  aperture,  as  in  Figure  4- 1(b), 
the  output  is  the  integral  over  the  aperture  of  fe  (£.  rj): 


or 


f  u.ti) 


Aperture 


tj  -  2  v)  w  (u,  v)  du  dv 


(4-3) 


f  U,v)  ~  JJ  b(x,  y)  w  |  i  .gJL  .  -  - --  |  dx  dy  (4-4) 

projection  of 
aperture  on  x,  y 
object  plane 
from  point  ({,  17) 


The  resulting  output  is  a  cross- correlation  of  the  aperture  with  the  input 
photograph  in  every  position  of  registry. 
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In  this  implementation,  w(u,  v)  is  a  positive  real  function,  and  the 
output  f(£,  n)  i-  also  positive  real.  It  is  impossible  to  implement  directly 
negative  lobes  on  the  weighting  function.  "No  correlation"  corresponds  to 
a  positive  dc  value  rather  than  to  zero. 

The  lensless  correlograph  produces  a  bright  correlation  peak  in 
light  intensity  at  diagonally  opposite  points  in  the  image  plane.  This  peak 
corresponds  to  the  location  of  a  feature  in  the  object  plane  for  which  a  corre¬ 
lation  filter  is  placed  in  the  aperture  plane.  In  order  to  detect  specific  single 
objects  or  features,  a  thresholding  operation  must  be  performed  at  each 
point  in  the  output  image.  Object  counting  corresponds  to  integrating  the 
thresholded  output  over  a  frame. 

The  practical  advantages  of  the  lensless  correlograph  include  its 
relative  simplicity  and  its  ability  to  test  for  features  independent  of  position 
of  those  features  in  the  image  field,  all  in  one  parallel  operation.  Disadvantages 
include  the  inability  to  perform  non-linear  transformations  (e.  g. ,  gradient 
magnitude)  in  a  direct  way,  difficulty  in  obtaining  adequate  light  levels  in  the 
output,  difficulty  in  obtaining  fast,  efficient,  stable,  and  uniform  parallel 
thresholding  logic  sensors  to  use  in  the  output  stage,  and  inability  to  incor¬ 
porate  negative  weights  in  the  aperture. 

Coherent  Optical  Spatial  Filtering 

The  coherent  optical  spatial  filtering  technique  provides  spatial 
filtering  directly  in  the  spatial  frequency  domain.  When  a  collimated  beam 
of  monochromatic  light  is  passed  through  a  transparency  and  a  high  quality 
lens,  under  ideal  conditions,  the  distribution  of  light  in  the  focal  plane  of  the 
lens  corresponds  to  the  Fourier  spatial  frequency  transform  of  the  trans¬ 
parency.  As  the  parallel  rays  of  light  pass  through  the  transparency,  they 
are  diffracted  or  bent  at  each  point  in  proportion  to  the  spatial  frequency 
content  at  that  point.  In  the  focal  plane,  all  the  unbent  (dc  component)  rays 
converge  to  a  point  on  the  optical  axis.  All  the  rays  bent  by  a  given  angle, 
corresponding  to  a  given  spatial  frequency,  also  emerge  in  parallel  from  the 
transparency  and  are  converged  by  the  lens  to  a  single  off-axis  point  in  the 
focal  plane,  with  the  distance  from  the  optical  axis  to  the  point  proportional 
to  the  magnitude  of  the  corresponding  spatial  frequency  (see  Figure  4-2). 

This  property  can  be  used  for  measurement  of  the  textural, parameters  of  the 
imagery  by  sensing  various  components  of  the  power  spectrum.  ^ 

When  a  second  lens  is  added  to  the  system,  the  input  transparency 
is  imaged  on  the  "film"  plane.  The  Fourier  spectrum  in  the  focal  plane  is 


inversely  transformed  into  a  reproduction  of  the  input  image.  However,  when 
a  spatial  filter  in  the  form  of  a  transparency  is  placed  in  the  focal  plane,  the 
image  reproduced  is  the  filtered  version  of  the  input  image  (see  Figure  4-3). 
That  is, 


=  3f  [i 


B  Wy)  •  G 


(“x--«y)]  = 


M*.  y)  *  g  (x,  y)  .  (4-5) 


where  f(x,  y)  is  the  output  image,  b(x,  y)  the  input  image  and  Blu*,  ay)  its 
transform,  G(«x,  «y)  the  spatial  filter  function  and  g(x,  y)  its  inverse  trans¬ 
form.  The  resulting  function  f(x,  y)  could  also  have  been  derived  with  a 
lensless  correlograph  or  electronically,  using  scanning  and  a  delay  line 
cross-correlator. 

The  coherent  light  optical  spatial  filter  technique  appears  to  have 
the  same  basic  advantages  and  disadvantages  as  the  lensless  correlograph 
listed  in  the  previous  paragraph,  except  that  it  can  incorporate  negative 
weights  in  the  aperture.  Other  practical  problems  arise  in  obtaining  an 
adequate  coherent  light  source  and  a  good  optical  system.  The  implemen¬ 
tation  of  a  desired  filter  function  may  be  difficult  also,  since  the  filter  must 
vary  in  both  transmission  (amplitude)  and  thickness  (phase).  Phase  varia¬ 
tions  can  be  obtained  by  varying  the  thickness  of  the  filter,  but  such  filters 
are  difficult  to  make. 

The  output  of  a  lensless  correlograph,  or  coherent  optical  spatial 
filtering  system,  is  a  two-dimensional  filtered  image  of  the  input  trans¬ 
parency.  The  principal  advantage  of  optical  filtering  is  that  all  the  filtered 
data  is  available  simultaneously.  The  parallel  sensor  must  be  capable  of 
thresholding  the  signal  on  a  point-by-point  basis  over  the  image.  Solid  state 
light  amplifiers  offer  promise  as  sensors  for  the  application.  The  use  of 
film  is  another  possibility,  although  the  processing  involved  is  a  distinct 
drawback. 


Solid  State  Optical  Decision  Panels 

The  basic  components  of  the  solid  state  optical  decision  panels  (see 
Figure  4-4)  are  a  sandwich  of  transparent  conductor,  photoconductive  layer, 
non-linear  resistance  layer,  electroluminescent  layer,  and  a  second  trans¬ 
parent  conductor. 

The  photoconductive  (PC)  layer  is  a  semiconductor  whose  conduc¬ 
tance  (1/R)  is  approximately  proportional  to  the  intensity  of  light  impingent 


Figure  4-3  Coherent  Light  Optical  Spatial  Filtering  System  for  Unity  Magnification 
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upon  its  surface.  Resistance  changes  over  a  range  of  4000  to  1  are  not  un¬ 
common  with  cadmium  sulfide  and  cadmium  selenide,  both  of  which  are 
commonly  used  in  photocanductive  panels.  The  non-linear  resistance  is 
silicon  carbide,  commonly  used  for  making  varistors.  The  electrolumines¬ 
cent  layer,  (CL),  emits  light  when  an  ac  potential  is  applied  on  the  layer. 
The  light  output  is  a  function  of  the  amplitude  and  shape  of  the  applied 
voltage  waveform.  The  output  depends  on  the  material  used,  but  the  general 
form  of  the  output  has  been  observed  to  be 


B  =  A(w)  exp  (b  V1^)  ft-lamberts  , 


(4-6) 


where  A(«)  is  a  characteristic  constant  for  a  particular  ac  frequency  u,  b  is 
a  constant,  the  value  depending  upon  the  material,  and  V  is  the  amplitude  of 
the  ac  vol'tage  across  the  CL  layer. 

The  sandwich  functions  as  a  highly  non-linear  light  amplifier,  and 
can  be  used  as  a  thresholding  array.  An  ac  voltage  is  applied  across  the 
sandwich,  between  the  two  transparent  conductive  layers.  When  no  light  hits 
the  panel,  the  resistance  of  the  photoconductor  is  high  and  little  current  flows. 
As  a  result,  the  varistor-type  layer  is  of  high  resistance,  thus  further 
lowering  the  current.  The  voltage  across  the  EL  layer  is  very  low,  and  no 
light  is  radiated. 

When  light  strikes  a  point  on  the  panel,  it  lowers  the  photoconductor 
resistance,  resulting  in  increased  current.  This  decreases  the  varistor-layer 
resistance,  and  the  voltage  across  the  EL  layer  increases  sharply.  This  in 
turn  results  in  emission  of  light  at  all  points  where  the  light  input  exceeds  a 
particular  threshold  value.  If  the  impinging  light  was  a  high-pass  filtered 
image  of  a  target,  the  light  output  would  be  a  detail  detected  reproduction  of 
the  target. 

The  speed  of  a  solid-state  decision  panel  is  determined  by  the  light 
flux  density  striking  the  EL  layer.  The  average  illuminance  (flux  density)  in 
the  image  plane  of  a  lensless  correlograph,  whare  the  panel  might  serve  as 
a  sensor,  is  given  by  Equation  4-7,  which  is  derived  in’Appendix’D.' 
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where 


m  is  the  number  of  elementa  in  the  aperture, 

n  ia  the  total  number  of  picture  elementa  in  the  input 

B  ia  the  peak  value  (white)  of  luminoua  omittance  in  the  input 
in  lumens/ square  foot,  and 

aQ  and  a^  are  the  linear  dimenaiona  of  a  picture  element  in  the 
object  and  image  plane  reapectively  given  in  any  consistent 
units.  The  values  of  aQ  and  aj  are  determined  by  the  input 
photograph  resolution  and  output  sensor  resolution,  respec- 
•  tively. 

Possible  data  rates  based  on  the  correlograph  in  combination  with  the 
EL.- Variator -PC  panel  can  be  computed  assuming  some  typical  values  for  the 
parameters.  Using  a  10-element  aperture  with  a  tungsten  wire  source  at 
3200*K,  over  a  portion  of  the  input  image  1000  elements  square  having  2-mil 
resolution  and  a  sensor  with  10-mil  elements,  we  obtain 


m 

= 

10 

n 

- 

vO 

O 

ao 

= 

2  mils 

ai 

= 

10  mils 

B 

= 

10^  lumens /ft 

E 


10  •  106 
— g - 

10  •  64  ir 


.2 


10+2 
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(4-8) 


The  decision  time  (rise  plua  decay)  for  one  aperture  under  the  fore¬ 
going  conditions  ia  approximately  0.  1  second  ■  By  using  a  larger  aperture, 
times  on  the. order  of-Q.  02  second  ere  expected.  These  figures  are  based  on 
characteristics  of  an  EL- PC  sandwich  designed  for  linear  operation,  i.  e. , 
without  a  silicon  carbide  layer. 
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The  basic  idea  of  a  solid-state,  optical  decision  panel  is  very  attrac¬ 
tive  because  of  the  possibilities  it  offerB  for  high-speed,  parallel  data  pro¬ 
cessing. 


The  principal  limitations  of  state-of-the-art  solid  state  light  ampli¬ 
fier  panels  are: 

1.  Light  outputs  are  too  low  to  be  useful  as  input  to  succeeding 
layers  of  optical  logic  («  10  ft-lamberts). 

2.  Stable  PC  layers  with  uniform  photoconductivity  over  the  entire 
retinal  space  are  very  difficult  to  obtain. 

3.  Resolution  is  limited  by  the  EL  layer  to  about  0.  01  inch  per 
element  which  is  over  ten  times  as  large  as  a  film  resolution 
element. 

4.  Decisions  cannot  be  stored  directly  for  later  processing  in 
conjunction  with  other  data. 

Film  Techniques 

High- resolution,  rapid- process  film  may  be  used  to  record  the  out¬ 
put  data  and  to  preserve  the  information  after  development  for  additional 
parallel  processing.  Currently  available  films  have  optical  resolutions  of 
better  than  1000  lines/mm,  a  figure  which  is  more  than  required  for  recogni¬ 
tion  by  current  methods.  The  exposure  time  for  each  element  under  the  same 
conditions  as  for  the  EL  panel  would  be  approximately  2  seconds.  Processing 
speed  is  5-10  seconds;  however,  as  many  frames  as  necessary  may  be  processed 
simultaneously,  so  that  the  exposure  time,  rather  than  film  processing  time,  is 
the  limiting  factor. 

Film  has  the  following  advantages: 

1.  Ease  of  procurement  and  reliability 

2.  Permanent  storage  of  information 

3.  Non-linearities  can  be  realized  in  the  film,  i.  e. ,  the  film  can 
perform  thresholding. 


The  disadvantages  are: 

1,  Additional  development  of  hardware  is  necessary  to  enhance 
film  operation 

2.  Operating  conditions  must  be  carefully  controlled  (temperature, 
humidity,  etc. ). 

Sequential  Detection  After  Optical  Filtering 

The  signal  at  the  output  of  the  optical  systems  can  be  detected  se¬ 
quentially  using  one  of  the  scanning  devices  described  in  the  next  section; 
however,  this  technique  docs  not  take  advantage  of  the  parallel  access  proper¬ 
ty  of  optically  filtered  images.  In  addition  to  its  utility  for  feature  detection, 
as  described  in  the  foregoing  subsections,  optical  filtering  of  an  entire  image 
may  be  useful  for  simple  preprocessing  operations  preliminary  to  sequential 
electronic  scanning,  a.  g.  ,  where  a  high-pass  spatial  frequency  filter  is 
applied  optically  to  the  input  image  for  edge  enhancement. 

4.  3 _ Sequential-Access,  Multiple  Parallel-Masking  Techniques 

Scanning  Techniques 

Electronic  scanning  techniques  are  of  course  not  limited  to  the  con¬ 
ventional  TV  scan.  For  target  recognition  systems  the  raster  generating 
waveforms  can  be  adapted  to  the  specific  need*  of  the  logic.  For  application!, 
where  it  is  desired  to  cross-correlate  a  particular  weighting  function  in  all 
positions  of  registry,  the  optimum  raster  waveform  is  a  sawtooth  ribbon 
scan,  with  height  greater  than,  or  equal  to,  the  template  height  (see  Figure  4-5). 
The  time-varying  waveform  output  from  a  ribbon  scan  n  elements  wide  is 
related  to  the  retinal  elements  by  the  equation 

f  (i  -  i  mod  n)  1 

v  (t  -  i  At )  =  r  x  -  (i  mod  n)  Ax,  y  -  -  ■— »  Ay  I  1  -  0, 1,  ...  , 

(4-9) 

where  t  is  a  time  reference  corresponding  to  point  (x,  y)  on  the  retina,  and  At 
is  the  temporal  Nyquist  interval  corresponding  to  the  spatial  interval, 

Ax  (=  Ay).  A  similar  equation  can  be  written  for  a  continuously  varying  signal, 
v(T+  t),  in  terms  of  r  (X  +  x,  Y  +  y),  but  is  more  complex  since  it  involves 
continuous  variation  along  x,  but  discrete  steps  Ay  along  y  corresponding  to 
the  interval  between  scanning  lines. 
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Figure  4-5  Relationship  Between  Retinal  Element*  and  Ribbon  Scan  Video 
>  Output  Elements 
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The;  delay  line  cross-correlator  ccrves  a*  dynamic  storage  for  the 
"put  video,  v{t).  The  output  at  any  tap,  Vfc{t),  is  identical  to  the  input  video 
with  a  fixed  time  delay,  vfc(t)  =  v(t  -  kAt).  Taps  are  spaced  at  points  equiva- 
! in  delay  to  the  Nyquist  interval,  At.  Therefore  the  output  at  each  tap  at 
time  t  corresponds  to  a  specific  point  of  the  image: 

vk{t)  =  v(t  -  kAt)  =  r  -  (k  mod  n)  Ax,  y  -  Ay]  .  (4-10) 

The  cross-correlation  with  a  weighting  function  is  performed  by 
setting  the  gain  at  each  tap  to  the  desired  value  *  w(-kAt),  and  summing 

f(t)  =  E  w.  vk(t)  (4-11) 

k  1 


=  2  w(kAt)  r  (t  -  kAt)  (4-12) 
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^  (k  mod  n)  Ax,  ~  Ay|  •  r  x  -  (k  mod  n)  Ax, 


y  .  (k_-  k  mod  n )  Av  j  . 
7  n  J 


(4-13) 


Let 


i  =  k  mod  n 


(k.«  k  mod  n) 
J=  - - - 


(4-14) 

(4-15) 


f (t)  =  f  (x,  y)  = 


2E  r  £x  -  iAx,  y  -  j  Ayj 


w(iAx,  jAy). 


(4-16) 


Various  delay  line  cross -correlator  devices  have  been  developed  for 
radar  and  other  signal  processing  applications  and  may  be  applicable  to  the 
target  recognition  problem.  Two  particular  types  ol  devices  have  special 
merit.  A  detailed  discussion  of  optically-tapped  acoustic  delay  line  cross¬ 
correlators  is  given  further  on  in  this  section.  These  have  the  merit  of 
being  able  to  handle  large  numbers  of  weighting  functions  in  parallel,  with 
no  loading  problems.  Also,  weighting  functions  can  be  changed  readily  by 
changing  photographic  film  mosaic  weighting  masks.  For  cases  where  the 
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retinal  element  data  ia  preproceased  into  a  binary  function  auch  as  the  edge 
or  no  edge  function  previoualy  deicribed,  ahift  register  cross-correlators 
may  be  uaed  effectively.  The  principal  merit  of  the  ahift  regiater  cross- 
correlator  ia  ita  high  atate  of  development  for  auch  applications,  e.  g. ,  the 
Philco-Post  Office  Mail  Sorter,  plus  ita  ability  to  atop  on  command  and 
hold  an  entire  array  of  input  data  in  storage.  Thia  latter  property  ia  parti¬ 
cularly  uaeful  in  laboratory  reaearch  atudiea  where  the  preciae  nature  of 
the  input  eignal  at  time  of  recognition  often  muat  be  determined. 

Summary  of  Evaluation  of  Scanning  Device  a 

The  material  which  follow  a  aummarisea  the  conclueiona  of  a  aurvey 
on  method  a  of  a  canning  photographic  data.  The  aurvey  waa  conducted  to 
determine  the  feaaibility  of  scanning  aerial  photographs  having  5000  TV  line 
resolution  acroaa  a  9 -inch  field.  The  video  bandwidths  involved  are  com¬ 
mensurate  with  9  by  9-inch  frames  at  frame  rates  on  the  order  of  a  few 
seconds.  The  details  of  the  survey  are  presented  in  Appendix  C. 

Table  4-1,  "Comparison  of  Scanning  Devices, 11  lists  the  candidate 
scanning  devices  and  evaluation  of  each  in  terms  of  the  characteristics  con¬ 
sidered  to  be  significant  in  the  present  application.  The  devices  are  com¬ 
mented  upon  in  detail  in  the  paragraphs  which  follow. 

The  5000-element  requirement  meana  that  aix  image  orthicona  would 
have  to  be  used  in  parallel,  or  provisions  made  for  mechanically  repositioning 
the  film  or  scanner  in  order  to  cover  the  full  frame.  Either  case  would  be 
undesirable,  the  first  because  of  image  orthicon  coat,  and  the  second  because 
of  the  mechanical  problems  involved. 

The  vidicon  has  about  the  same  resolution  as  the  image  orthicon; 
however,  vidicana  are  relatively  inexpensive,  so  the  duplication  would  not 
be  so  costly.  The  vidicon  is  slow;  0.  1  second  or  more  ia  required  between 
aucceaaive  acans,  and  the  viewed  object  must  remain  essentially  motionless 
during  thia  tine.  In  addition,  there  ia  no  possibility  of  overlap  scanning  in 
a  single  vidicon. 

The  non-storage  property  of  a  flying  spot  scanner  permits  it  to  dean 
a  moving  frame.  In  addition,  scanning  pattern  flexibility  and  high  resolution 
are  available,  and  may  be  utilised  to  advantage  in  imagery  scanning.  If,  for 
example,  a  rotating  phosphor  scanning  tube  is  uaed,  it  is  possible  to  scan 
horizontally  for  5000  elements  without  the  need  for  duplication  of  scanning 
equipment.  The  element  rate  of  a  flying  spot  scanner  is  limited  by  the  phos¬ 
phor  decay  time;  however,  fast  decay  phosphors  auch  as  P-16  allow  data  rates 
as  high  as  32  x  10°  elements  per  second. 
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A  high-brightness,  high- re  solution  flying  spot  scanner  could  make 
use  of  mechanical  scanning  in  the  vertical  direction.  Film  could  pass  through 
the  scanning  plane  at  a  constant  rate  while  being  scanned  horizontally  with  a 
ribbon  scan.  It  appears  practicable  to  think  in  terms  of  a  rate  of  motion  such 
that  50  percent  overlap  is  obtained  using  a  60-element  high  ribbon  scan. 

The  primary  difficulty  with  flying  spot  scanners  if*  that  of  obtaining 
a  trace  bright  enough  to  provide  a  good  signal-to-noise- ratio  from  the  photo¬ 
multiplier.  The  rotating  phosphor  tube  answers  this  need,  but  at  considerable 
expense.  In  addition,  these  devices  are  designed  for  single  line  scanning  and 
may  not  retain  high  resolution  when  wide  ribbon  scan  is  used. 

The  image  dissector  is  capable  .of  up  to  3000  lines  resolution.  How¬ 
ever,  it  cannot  provide  high  resolution,  high  data  rate,  and  good  signal-to- 
noise  ratio  simultaneously.  For  the  proposed  application,  satisfactory 
resolution  and  data  rate  result  in  an  extremely  poor  calculated  signal-to- 
noise- ratio  of  2  to  1.  To  improve  this,  it  would  be  necessary  to  increase 
picture  element  size  in  proportion  to  the  signal-to-noise  ratio  increase  re¬ 
quired.  A  flying  spot  scanner  is  the  best  available  type  of  electronic  scanning 
device.  The  other  three  devices  are  undesirable  for  the  following  reasons: 

Vidicon  —  lag  time,  resolution,  and  storage  property 

Image  Orthicon  —  resolution  and  storage  property 

Image  Dissection  --  signal-to-noise- ratio:  and  data  rate 

The  remaining  question  is  whether  or  not  a  conventional  flying  spot 
scanner  may  be  constructed  utilizing  available  high  resolution  cathode  ray 
tubes,  or  whether  specialized  equipment  must  be  developed  (perhaps  utilizing 
the  rotating  phosphor  tube).  The  results  of  Appendix  F  indicate  that,  through 
careful  design,  a  conventional  flying  spot  scanner  may  be  constructed  that 
provides  adequate  signal-to-noise- ratio  for  Laplacian  and  gradient  detection. 
Therefore,  a  carefully  designed  conventional  flying  spot  scanner  is  recom¬ 
mended  for  electronic  scanning. 

Cross- Correlation  With  Sequential  Accessing 

A  recognition  system  which  scans  the  input  image  and  inserts  it  se¬ 
quentially  bit  by  bit  into  a  dynamic  storage  device  and  correlates  the  contents 
against  a  set  of  reference  patterns,  has  the  advantage  of  effectively  testing  the 
correlation  between  the  input  data  and  the  templates  in  all  possible  trans¬ 
lational  registrations. 
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The  storage  element  can  be  a  shift  register,  delay  line,  or  device 
which  converts  the  serial  data  into  a  parallel  accessed  array;  the  required 
capacity  is  1000  bits,  at  a  data  rate  of  10  megabits  per  second.  To  perform  the 
correlation  the  storage  element  must  be  connected  to  a  semipermanent 
memory  containing  reference  templates. 

Shift  registers  are  available  in  solid  state,  thin  film,  and  ferrite 
core  configurations.  Core  register  shift  rates  are  less  than  a  megacycle 
and  therefore,  inadequate  for  this  application.  The  thin  film  shift  register 
has  higher  speed  and  high  density.  Available  units  shift  at  a  2  megacycle 
rate  and  have  capacities  of  256  bits.  Future  development  of  this  device  may 
increase  the  data  rate  sufficiently  to  qualify  it  for  this  application.  Solid 
state  registers  can  be  obtained  with  shift  rates  up  to  20  megacycles  and  capaci¬ 
ties  up  to  300  bits.  These  registers  are  relatively  expensive  and  the  problems 
in  increasing  the  capacity  to  1000  bits  are  apparently  great  enough  that  same 
suppliers  have  refused  to  attempt  more  than  a  300-bit  size.  A  1000-bit,  10 -me 
register  would  need  to  be  developed  and  would  cost  at  least  $20  to  $50K. 

A  Golay  lumped- constant  delay  line  has  been  built  with  a  2  megacycle 
bandwidth,  and  200  taps.  The  attenuation  of  the  line  was  about  35  db  but  indi¬ 
cations  are  that  this  could  be  reduced.  A  1000-tap  line  with  10  megacycles 
bandwidth  would  be  a  considerable  advance  in  the  state  of  the  art;  the  parts 
alone  will  cost  in  the  $10  to  $20K  range.  Magnetostrictive  delay  lines  are 
unattractive  because  of  the  high  attenuation  (3  db)  per  tap  and  the  mechanical 
and  electrical  problems  connected  with  multiple  read-out.  Glass  delay  lines 
using  optical  read-out  are  available  with  3  to  5  megacycle  video  bandwidths 
and  in  delays  of  up  to  200  psec  as  a  practical  limit  per  unit.  The  attenuation 
over  the  length  of  a  100  psec  line  is  only  3  db,  (the  principal  losses  being  in 
the  input  transducer).  To  the  cost  of  the  delay  line  itself,  of  course,  must  be 
added  the  cost  of  auxiliary  optical  and  electronic  equipment  for  driving  and 
reading  out. 

The  library  of  stored  patterns  against  which  the  input  data  is  to  be 
correlated  must  be  parallel  accessible.  Photographic  film  offers  high  density, 
analog  or  digital  data  storage,  low  cost,  is  easily  changed,  and  is  read-out  with 
light.  Resistive  matrices  have  been  used  in  recognition  apparatus,  but  they  are 
rather  expensive  and  tedious  to  construct;  a  rough  estimate  of  cost  would  be  $5 
to  $10K.  Diode  matrices  fall  into  a  similar  category  except  that  the  storage  is 
restricted  to  binary  information.  Capacitor  card,  metal  card  magnetic,  and 
ferromagnetic  semipermanent  stores  have  been  described  in  the  literature. 

They  are  still  principally  developmental  and  therefore  not  readily  available. 
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Because  the  photoelastic  glass  delay  is  capable  of  high  data  rates, 
high  storage  capacity,  and  parallel  read-out  at  reasonable  cost,  it  appears 
to  be  a  most  attractive  candidate  for  serious  implementation  study.  Current 
state-of-the-art  affords  data  rates  up  to  10  megabits  per  second  and  a  total 
storage  of  1000  bits  for.  a  100  psec  lip*.  The  fused  silica  line  is  approximately 
15  inches  in  length  and  should  cost  from  $1,  000  to  $2,  000.  This  does  not,  of 
course,  include  the  necessary  optics  and  driving  electronics. 

Previous,  application  of  the  photoelastic  delay  line  has  been  for  the 
purpose  of  obtaining  a  delay  device  which  can  be  readily  tapped  either  singly 
or  multiply  at  any  desired  position  on  the  line.  The  use  of  the  device  described 
here  is  different;  the  operation  expected  is  similar  to  that  of  a  large  capacity 
shift  register  with  all  bit  positions  available  simultaneously  for  parallel  output. 
This  type  of  operation  will  require  experimental  work  to  ascertain  its  practi¬ 
cability  and  to  assess  the  magnitude  of  the  problems  involved. 

The  fact  that  the  glass  delay  line  seems  to-  afford  the  best  solution  to 
the  requirement  for  a  multiply  tapped,  large- capacity  storage  device,  coupled 
with  the  fact  that  photographic  film  is  the  most  flexible  and  economical 
medium  f  or  the  semipermanent  template  store,  leads  quite  clearly  to  the  highly 
compatible  combination  of  the  two  aa  a  preferred  implementation.  The  elec¬ 
trical  interconnection  of  200,  000  points  in  the  template  store  and  shift  register 
is  replaced  with  an  optical  interconnection;  the  glass  delay  line  not  only  pro¬ 
vides  the  shifting  and  data  storing  function,  but  provides  the  light  source 
directly  for  interrogation  of  the  photographic  memory  bank. 

Figure  4-6  shows  a  typical  arrangement  for  optically  tapping  a  glass 
acoustic  delay  line.  A  ceramic  transducer  banded  to  one  end  face  of  the  glass 
slab  propagates  an  acoustic  wave  through  the  length  of  the  bar.  An  absorber 
is  bonded  to  the  other  end  to  prevent  reflection  of  the  wave  after  it  has  travers¬ 
ed  the  length  of  the  bar.  An  acoustic  signal  is  propagated  as  a  mechanical 
stress;  therefore,  the  wave  traveling  down  the  glass  is  a  moving  stress  pattern. 
Glass,  and  certain  other  materials,  possess  the  property  of  becoming  optically 
birefringent  when  subjected  to  mechanical  stress;  that  is,  the  glass  exhibits 
different  indices  of  refraction  for  light  polarised  parallel  to  or  perpendicular 
to  the  direction  of  stress.  This  produces  a  change  in  phase  relationship 
between  two  mutually  perpendicular  polarisation  components  of  light  projected 
through  the  glass. 

Referring  to  Figure  4-6,  a  light  source  and  lens  produce  a  collimated 
beam  of  light,  which  is  projected  through  the  glass  bar  in  a  direction  perpen¬ 
dicular  to  the  propagated  sonic  wave.  A  viewing  slit  is  placed  on  the  far  side 
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to  mask  out  all  the  light  except  a  narrow  aperture  which  is  adjusted  in  width 
to  be  a  fraction  of  a  wavelength  of  the  RF  carrier  in  the  glass  medium.  A 
polarizer  is  inserted  at  the  input  face  of  the  line  and  an  analyzer  is  placed 
at  the  output  face  with  its  orientation  adjusted  to  completely  cut  off  the  trains- 
mitted  light.  With  the  crossed  polariods.no  light  will  be  seen  by  a  photo¬ 
multiplier  which  views  the  slit  as  long  as  no  stress  is  introduced  into  the 
glass  bar.  Acoustic  stress  waves  traveling  down  the  line,  however,  will 
effectively  rotate  the  polarization  of  the  light  as  it  passes  through  the  bar 
and  produce  a  light  output.  The  transfer  characteristic  is  indicated  in 
Figure  4-6. 

In  order  to  obtain  a  true  reproduction  of  the  sonic  wave  in  the  glass 
bar,  a  quarter  wave  bias  plate  is  introduced  to  shift  the  operating  point  to 
the  center  of  the  characteristic  as  indicated  in  the  second  transfer  curve  in 
Figure  4-6.  Best  linearity  and  sensitivity  are  also  obtained  at  this  point. 

The  delay  can  be  easily  adjusted  to  any  value  desired  by  moving  the  bar  with 
respect  to  the  rest  of  the  optical  components  and  thus  changing  the  point  at 
which  read-out  occurs. 

A  diagram  of  The  general  optical  correlator  system  is  shown  in 
Figure  4-7.  It  is  assumed  that  a  scanner  and  preprocessor  generate  a  con¬ 
tinuous  ribbon  of  input  data  from  the  scanned  photograph  and  that  the  input 
to  the  correlator  is  an  array  of  bits  describing  each  picture  element  by 
means  of  a  ONE  or  a  ZERO  (it  could  as  easily  be  analog,  however). 

The  input  data  is  used  to  modulate  a  carrier  which  is  propagated  as 
an  acoustical  stress  wave  down  the  glass  delay  line  by  means  of  the  ceramic 
or  quartz  transducer  cemented  to  the  end.  A  collimated  polarized  light 
source  of  sufficient  size  to  illuminate  the  entire  length  of  the  bar  projects 
light  through  the  glass  at  right  angles  to  the  direction  of  wave  travel.  When 
viewed  through  an  analyzer  and  a  slit  at  each  bit  location,  the  entire  contents 
of  the  delay  line  appear  as  a  moving  array  of  modulated  light  patterns.  An 
optical  system  can  project  this  data  array  onto  a  bank  of  photographic  mask 
sets.  A  better  method,  which  eliminates  the  optics,  is  shown  in  Figure  4-7 
and  places  the  photographic  mask  sets  directly  after  the  analyzer  and  slit 
array  on  the  output  face  of  the  delay  lines. 

The  modulated  light  coming  from  each  bit  position  in  the  input  data 
array  is  transmitted  through  the  photographic  masks  for  all  recognition 
classes;  only  two  classes  are  shown  in  Figure  4-7  for  simplicity.  Each  class 
mask  is  divided  into  a  specific  area  for  indicating  positive  weighting  for  any 


4-22 


bit  location  and  an  area  for  negative  weighting.  The  magnitude  of  the  weight 
to  be  applied  is  determined  by  the  degree  of  exposure  of  that  particular  bit 
location  on  the  mask  during  the  preparation  of  the  mask.  The  correlation  of 
each  bit  in  the  input  data  array  with  the  stored  reference  pattern  on  the  mask 
is  therefore  simply  the  amount  of  light  transmitted  through  the  mask.  The 
sign,  positive  or  negative,  is  indicated  by  the  location  of  the  weight  in  either 
of  the  two  specified  mask  areas;  if  the  weight  is  positive,  the  negative  weight 
space  is  black,  and  the  positive  weight  space  is  clear  and  vice  versa  when  the 
weight  is  negative.  Correlation  of  the  input  pattern  with  the  stored  pattern  is 
taken  as  the  algebraic  summation  of  all  the  positive  and  negative  bit  correlation 
quantities.  Two  photosensors,  with  their  necessary  optics,  are  arranged  to 
view  the  positive  weight  areas  and  the  negative  weight  areas  respectively,  for 
each  class,  over  the  entire  length  of  the  delay  line.  This  is  shown  schemati¬ 
cally  in  Figure  4-7  as  a  cylindrical  lens  system  designed  to  reduce  the  length 
of  the  viewed  area  to  a  dimension  compatible  with  the  photocathode  size  in  the 
viewing  photosensor.  It  is  anticipated  that  a  fiber  optic  assembly  may  be 
superior  to  a  lens  for  collecting  the  total  transmitted  light  and  performing  the 
appropriate  change  in  geometry.  This  is  discussed  in  Appendix  G.  The  photo¬ 
sensor  outputs  for  each  class  are  then  subtracted  Electrically  and  a  threshold 
applied  for  making  an  output  decision. 

The  advantages  of  a  system  of  the  type  under  discussion  here  lies  in 
the  possibility  of  simultaneously  correlating  a  large  section  of  the  serially 
presented  input  data  with  a  very  large  number  of  reference  patterns.  The 
number  of  such  classes  is  limited  only  by  the  practical  problems  of  registry, 
class  separation,  and  physical  location  of  the  required  photosensors.  Obtain¬ 
able  data  rates  also  appear  adequate  for  the  desired  recognition  rates. 

It  is  highly  desirable  from  the  standpoint  of  threshold  stabilities  to  be 
able  to  sum  the  positive  and  negative  mask  correlations  algebraically  without 
the  difficulties  usually  experienced  in  subtraction  circuitry  balance  and  sym¬ 
metry.  Several  methods  have  been  proposed  and  examined  for  avoiding  the 
drift  and  device  matching  problems  attendant  to  photocell  bridge  and  electri¬ 
cal  differencing  circuits.  At  least  two  of  these  appear  sufficiently  practical 
to  warrant  serious  attention;  they  are  a  carrier  commutated  light  subtraction 
technique  and  an  interdigitated  grating  light  commutation  arrangement.  These 
methods  are  described  in  detail  in  the  following  paragraphs. 

Figure  4-8,  "Single  Sensor  Optical  Correlator  Using  Carrier  Com¬ 
mutation,  "  shows  a  system  which  is  different  in  only  two  respects  from  the 
original  two-sensor  arrangement.  First,  only  one  photosensor,  viewing 
both  the  positive  and  negative  mask  areas,  is  used.  Second  the  analyzer  has 
been  replaced  with  a  composite  which  equips  all  positive  weight  areas  with  a 
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Figure  4-8  Single  Sensor  Optical  Correlator  Using  Carrier  Commutation 


polariod  analyzer  at  the  tame  orientation  a*  the  original,  and  equip*  all 
negative  weight  areas  with  an  analyzer  oriented  at  right  angle*  to  that  of  the 
original. 


To  explain  the  effect  of  these  change*,  consider  the  basic  single 
slit  read-out  arrangement  described  earlier.  The  effect  of  the  stress  vari¬ 
ations  traveling  past  the  read-out  slit  was  to  perturb  the  vector,  representing 
the  light  polarization  about  the  45*  bias  point.  When  viewed  with  the  ana¬ 
lyzer,  this  gives  the  0*  transfer  characteristic  shown  in  Figure  4-8a.  If  the 
analyzer  in  this  original  arrangement  is  rotated  through  90*,  the  transfer 
characteristic  becomes  that  shown  for  90*  in  Figure  4-8a.  The  light  intensity 
variation  at  the  carrier  frequency  as  seen  by  the  photomultiplier  is  new  of 
reversed  polarity  for  the  90*  analyzer  position  compared  to  the  original 
0*  position. 

Another  way  of  looking  at  it  is  illustrated  in  Figure  4~8b.  Assume 
the  mean  position  of  the  output  light  polarization  to  be  at  45*.  Two  analyzers 
illuminated  by  this  light  and  arranged  at  0*  and  90*  will  analyze  the  incident 
light  into  its  0*  and  90*  components.  Movement  of  this  input  light  vector 
caused  by  the  stress  wave  in  the  glass  causes  the  variation  in  light  trans¬ 
mitted  by  the  two  analyzers  to  be  of  opposite  phase,  i.  e. ,  when  the  light 
through  the  0*  analyzer  is  instantaneously  increasing,  the  light  through  the 
90*  analyzer  is  decreasing.  A  photosensor  which  views  the  light  variations 
transmitted  by  both  analyzers  observes  the  difference.  The  polarity  and 
magnitude  of  the  photosensor  signal  can  therefore  be  determined  by  intro¬ 
ducing  a  mask  between  the  two  analyzers  and  the  sensor  in  order  to  alter  the 
relative  transmission  of  the  light  paths.  Only  one  sign  of  weighting  is  used 
at  any-  one-bit  location;  the  other  sign  has  an  opaque  mask.  A  full  positive 
or  negative  weight  would  use  a  mask  which  was  transparent  in  one  analyzer 
area  and  opaque  in  the  other.  The  light  shown  coming  through  both  masks  in 
Figure  4- 8b  actually  refers  to  the  summation  of  light  from  all  the  masks  of 
each  sign. 

The  carrier-commutated  system  requires  good  registration  of  the 
viewing  slits  with  respect  to  the  carrier  wave  in  the  delay  line,  in  order  that 
the  light  from  all  slits  will  add  in  phase  for  each  direction  of  polarization. 

Since  an  array  of  1000  such  slits  occuping  15  inches  is  contemplated,  a  high 
degree  of  accuracy  is  required.  The  higher  the  carrier  frequency,  the  greater 
the  required  slit  location  accuracy.  It  may  be  necessary  to  provide  the  system 
shown  in  Figure  4-9  with  an  automatic  frequency  control  loop  to  maintain  the 
carrier  in  registry  with  the  slit  array. 
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A  simpler  method  appears  feasible  for  providing  single  sensor 
algebraic  summation  of  the  positive  and  negative  correlations.  It  iB  based 
on  the  use  of  a  mechanism  for  alternately  presenting  the  positive  and 
negative  mask  correlation  quantities  to  the  photosensor;  an  illustration  of 
this  is  shown  in  Figure  4-9.  The  glass  delay  line  is  uniformly  illuminated 
by  a  collimated  polarized  light  source,  as  before,  and  the  data  are  again 
used  to  modulate  a  carrier  which  drives  the  transducer.  The  quarter-wave 
bias  plate  is  omitted,  however,  to  place  the  operating  point  at  minimum 
transmission.  A  burst  of  carrier,  therefore,  causes  a  full-wave  rectified 
light  output  (see  Figure  4-10)  when  viewed  through  the  analyzer. 

In  order  to  produce  alternate  presentation  of  the  positive  and 
negatively  weighted  light  quantities  to  the  photosensor,  a  moving  grating 
is  interleaved  with  the  input  data.  Each  data  bit  is  reduced  to  half  its  former 
duration  and  a  ZERO  (no  light)  is  inserted  between  each  bit.  The  signal 
driving  the  line  is  shown  in  Figure  4-11. 

Every  other  bit  location  in  the  input  signal  is  made  a  ZERO  to  form 
the  grating;  the  remaining  bits  represent  the  data  by  inserting  an  RF  burst 
for  a  ONE  and  no  RF  for  a  ZERO. 

The  mask  format  contains  two  possible  positions  for  weighting  at 
each  bit  location.  The  bit  location  is  divided  in  half.  F or  positive  weighting, 
the  first  half  is  used;  for  negative  weighting,  the  second  half  is  used.  In 
other  words,  the  mask  for  each  claBS  weights  each  bit  either  positively  or 
negatively  by  blacking  out  one  of  the  two  halves  at  each  bit  location  and  in¬ 
serting  the  weight  value  in  the  other  half  by  shading. 

The  effect  of  the  moving  grating  and  the  stationary  mask  format  is 
to  first  transmit  to  the  photosensor  the  input  data  multiplied  by  all  the 
positive  mask  weights,  and  then  by  all  the  negative  weights. 

The  alternating  component  of  the  photosensor  output  is  then  phase 
detected  with  the  signal  used  to  form  the  grating  in  order  to  determine  the 
sign  and  magnitude  of  the  total  algebraic  summation  for  each  class. 

Interdigitated  grating  operation  is  much  simpler  than  carrier  com¬ 
mutation.  No  slit  array  is  required,  which  avoids  fabrication  and  registry 
problems,  and  binary  light  operation  is  used  which  permits  some  improve¬ 
ment  in  bit  resolution  (see  Appendix  G).  This  is  obtained  at  the  expense  of 
using  only  half  the  bit  capacity  of  the  delay  line  for  holding  the  data.  If  it  is 
assumed  that  the  same  data  input  iB  used  for  this  arrangement  as  for  carrier 
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Figure  4-10  Transfer  Characteristic  Used  for 

Interdigitated  Grating  Commutation 


commutation,  the  effective  bit  rate  necessary  in  the  delay  line  for  the  same 
data  rate  becomes  exactly  doubled.  If  the  data  input  contained  all  ONES,  for 
example,  the  delay  line  contents  would  be  a  train  of  alternating  light  and  dark 
stripes  at  twice  the  bit  rate.  This  approach  is  attractive  for  high  frequency 
transducer  operation.  For  a  10-megabit  data  rate,  a  carrier  frequency  of 
40  mcs  and  a  transducer  bandwidth  of  20  mcs  would  be  adequate.  Since  slit 
read-out  is  not  used,  there  is  no  problem  of  phase  variation  at  high  carrier 
frequencies  because  of  slit  misalignment,  as  there  may  be  with  a  carrier 
commutated  system. 
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SECTION  5 


CONCEPTUAL  DESIGN 
OF  AN  OBJECT  RECOGNITION  SYSTEM 


5=  i  General 


Figure  5-1  presents  a  conceptual  block  diagram  of  a  system  tenta¬ 
tively  proposed  for  the  image  screening  application.  Designing  and  evaluating 
such  a  system  provides  a  test  of  the  validity  of  the  concepts  deriving  from  the 
study.  The  fundamental  building  blocks  of  the  system  to  be  considered  here 
are  the  following: 

•  a  high  definition  flying  spot  scanner  with  digitally  controlled 
film  transport  and  sweeps 

•  electronic  preprocessing  and  recognition  circuitry 

•  digital  logic  to  combine  outputs  of  recognition  sub-units, 
code  the  outputs  properly,  and  command  the  scanner 

•  a  means  for  recording  coordinates  of  detected  objects 

•  an  off-line  display  for  convenience  of  the  photo -interpreter 

®  a  control  console 

The  discussion  which  follows  treats  each  of  these  items  in  some  detail. 

5.  2 _ Flying  Spot  Scanner 

Since  high-quality  aerial  photography  may  involve  as  many  as  5000 
separate  resolution  elements  across  a  9-inch  dimension,  a  high  resolution 
sensor  is  required.  A  summary  of  the  choice  of  scanning  sensors  available 
and  the  many  considerations  involved  in  the  choice  of  a  sensor  is  presented 
in  Appendix  E.  At  present,  the  best  sensor  choice  appears  to  be  a  twin  flying- 
spot  scanner  as  shown  in  Figure  5-2  working  with  the  semitransparent  film 
negative.  The  use  of  a  twin  scanner  is  necessary  because  flying  spot  scanners 
with  overall  resolutions  of  5000  television  lines  or  better  lie  near  the  frontier 
of  the  art.  When  5000  lines  are  obtained  they  are  achieved  at  the  cost  of  low 
beam  current,  thin  phosphor,  and  low  brightness  output. 

In  the  proposed  twin  scanner,  each  section  has  an  overall  resolution 
in  excess  of  3000  lines,  in  order  to  achieve  a  tothl  resolution  of  5000  lines  or 
better.  By  working  directly  with  negatives  or  positive  transparencies,  the 
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system  obtains  light  collection  efficiencies  as  high  as  100:1  greater  than  could 
be  obtained  with  opaque  prints.  Appendix  F  shows  that  by  using  P-16  phosphor 
in  the  CRT,  a  video  bandwidth  of  five  megabits  should  be  possible  with  a  signal- 
to -noise  ratio  of  40  db.  This  data  rate  corresponds  to  one  picture  element 
every  0,  1  /nsec,  or  an  entire  9  by  9 -inch  frame  every  2.  5  seconds,  if  every 
picture  element  is  scanned  once. 

The  film  transport  can  be  a  digitally  controlled  servo  system  to  ad¬ 
vance  the  film,  and  start,  stop  and  reverse  at  command  from  the  recognition 
logic.  A  compensating  vertical  deflection  signal  can  be  obtained  from  this 
servo  and  used  on  the  scanner  to  compensate  for  instantaneous  positioning 
errors.  The  film  can  either  advance  continuously  or  be  stopped  occasionally 
for  repeated  inspection  of  suspect  areas  at  different  scale  factors. 

The  scanner  sweeps  are  periodic,  but  of  variable  size  and  pitch 
determined  by  digital  command  from  the  control  console  and  recognition  logic. 
Figure  5-3  shows  the  general  form  of  the  scan,  which  is  a  sawtooth  ribbon 
progressing  across  the  width  of  the  9-inch  film.  Normally,  the  ribbon  pro¬ 
ceeds  uninterruptedly  across  the  film,  crossing  over  first  one  CRT  and  then 
the  other  with  sufficient  overlap  in  the  fields  of  the  two  CRT's  to  ensure  recog¬ 
nizing  objects  which  lie  in  the  region  of  overlap. 

The  normal  size  of  the  scanning  ribbon  is  determined  by  the  maximum 
dimension  of  the  objects  to  be  recognized  as  individual  entities  in  the  scaled 
photograph.  For  example,  if  the  device  is  programmed  to  locate  armored 
vehicles,  the  ribbon  will  have  a  vertical  dimension  twice  that  of  the  largest 
tank  diagonal  expected  in  the  photograph. 

The  output  of  the  scanner  is  5  me  gray-scale  video.  This  signal  ia 
then  preprocessed  to  obtaina  quantized  signal  which  emphasizes  edges,  corners, 
and  straight  lines.  The  presently  preferred  method  of  extraction  is  to  obtain 
a  signal  roughly  proportional  to  the  magnitude  of  the  gradient  vector  of  the 
brightness  signal  in  two  dimensions,  i.  e.,  |  Grad  B  (x,  y)  |.  The  use  of  the 
circuit  shown  in  Figure  5-4  is  an  effective  way  to  obtain  this  signal.  Appen¬ 
dix  A  shows  that  a  gradient  signal  obtained  in  this  way  with  a  tapped  delay  line 
has  a  signal -to -noise  ratio  only  3  db  less  than  that  of  the  original  gray-scale 
video  from  which  it  is  derived.  Both  the  original  gray -scale  information  and 
the  quantized  gradient  signal  then  become  inputs  for  registry -testing  cross - 
correlators  which  follow. 

5.  3 _ Recognition  Logic  and  Components 

The  recognition  logic  which  is  described  here  is  designed  with  the  fol¬ 
lowing  considerations  in  mind. 
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Figure  5-3  Sawtooth  Ribbon  Scan  Used  With  Twin  Scanner 


Figure  5-4  Showing  Method  for  Deriving  a  Signal  Proportional  to  the  Magnitude  of  the.  Brightne 
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Object  recognition  circuits  and  networks  are  set  up  to  produce  a 
/recognition  output  only  when  the  desired  object  is  centrally  located 
and  aligned  with  the  recognition  circuitry  in  optimum  translation, 
with  the  major  axis  within  ±  10°  of  design  center  for  the  logic. 

Remaining  angles  of  rotation  are  covered  by  duplicating  the  recog¬ 
nition  logic  in  8  to  17  other  orientations  as  needed. 


Optimum  translation  is  obtained  by  permitting  the  input  signals  to 
propagate  down  a  delay  line  cross -correlator  in  such  a  way  that 
input  patterns  align  themselves  with  every  possible  position  of 
translational  registry  as  they  propagate  down  the  lines.  Thresh¬ 
old  recognition  circuits  and  all  combinational  logic  which  follows 
must  then  have  sufficient  switching  speeds  to  respond  with  indicated 
recognition  outputs  at  any  of  the  many  positions  of  registry  through 
which  the  input  image  passes. 


The  organization  of  the  recognition  logic  is  of  the  form  shown  in  Fig¬ 
ure  5-5.  An  initial  layer  of  threshold  logic  provides  local  area  feature  detec¬ 
tion.  These  local  area  features  together  with  a  few  important  feature  pairs 
(generated  by  AND  gates)  are  then  combined  with  linear  weights  (a  linear  dis¬ 
criminant  function)  to  achieve  a  threshold  object  detection.  Eight  or  more  of 
of  these  individual  object  detections,  each  for  a  different  range  of  angle,  are 
then  combined  in  a  multilegged  OR  gate  to  indicate  final  object  recognition. 
Although  only  one  delay  line  is  shown  in  the  diagram  in  Figure  5  -5 ,  it  should 
be  understood  that  at  least  two  such  delay  lines  will  be  used,  one  containing 
gray-scale  information  and  the  other  containing  quantized  gradient,  with  ap¬ 
propriate  feature  detectors  on  each. 


The  delay  line  cross -correlator  and  associated  feature  detectors  are 
the  heart  of  ths  object  recognition  system.  It  is  the  ability  of  this  parallel 
network  to  perform  a  large  number  of  linear  summations  and  threshold  deci¬ 
sions  per  unit  time  which  makes  the  device  effective  for  rapid  object  recog¬ 
nition.  Consider  a  delay  line  containing  1000  separate  resolvable  elements, 
each  capable  of  being  individually  weighted  and  brought  to  a  threshold  decision. 
Perhaps  only  100  of  these  elements  will  actually  receive  weights  for  any  one 
threshold  input.  Each  picture  element  occupies  0.  1  jusec  space  in  the  line, 
and  a  new  summation  and  threshold  decision  is  achieved  every  1/10  nsec. 

Thus,  one  threshold  element  performs  an  analog  summation  of  100  variable* 
in  0.  1  #zsec,  for  a  total  of  10^  analog  variables  summed  per  second.  A  single 
object  recognition  may  depend  on  the  parallel  operation  of  100  of  these  thresh¬ 
old  summers;  thus  10^  analog  variables  are  summed  per  second,  and  sub¬ 
jected  to  109  separate  subdecisions  per  second  in  order  to  scan  the  photograph 
lor  a  single  candiuctle  object. 
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Figure  5-5 


Organization  of  Recognition  Logic 
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Although  the  object  recognition  equipment  is  here  described  in  terms 
of  a  delay-line  cross -correlator,  a  shift-register  cross -correlator  could  be 
used  as  well.  Appendix  G  describes  an  optically  tapped  acoustic  delay  line  as 
a  cross -correlator  for  object  recognition  applications.  The  optically  tapped 
delay  line  appears  at  this  time  to  be  the  best  choice  for  the  following  reasons: 

e  The  basic  cost  per  buffered  storage  element  is  believed  to  be  an 
order  of  magnitude  less  for  the  acoustic  delay  line:  $5  per  bit 
versus  $50  per  bit. 

e  Changing  the  weights  to  summing  thresholds  for  different  recog¬ 
nition  problems  involves  merely  changing  photographic  slides  in 
the  case  of  the  optically  tapped  delay  line.  For  a  lumped  constant 
line  or  a  shift  register,  however,  it  involves  changing  a  wiring 
assembly  which  may  cost  $10,  000  to  $20,  000. 

•  The  assembly  of  fiber  optics  and  associated  weights  is  presently 
believed  to  be  more  easily  achieved  than  a  wiring  matrix  which 
permits  any  of  100  or  more  thresholds  to  have  access  to  any  of 
1000  taps. 

•  Gray-scale  information  can  be  handled  more  readily  in  the  delay 
line. 

The  system  described  thus  far  will  be  capable  of  performing  an  ele¬ 
mentary  image  screening  function,  i.  e.,  recognizing  and  marking  elementary 
objects  which  can  be  represented  with  sufficient  accuracy  in  32  x  32  element 
retinal  array.  The  device  scans  the  film  continuously  and  marks  such  simple 
objects  as  tanks,  trucks,  helicopters,  and  tents  as  they  are  encountered.  By 
merely  changing  the  size  of  the  scanning  standards  used,  any  object  or  object 
complex  for  which  32  x  32  element  resolution  is  sufficient  for  recognition  can 
be  so  marked. 

5.  4 _ Off-Line  Display  for  Photo -Interpreter 

In  order  to  utilize  the  output  of  the  screening  device,  some  sort  of 
viewing  "mechanism  is  required  for  use  by  the  photo -interpreter  which  will 
permit  extended  viewing  of  any  one  frame  while  the  image  screening  apparatus 
continues  processing  photographs.  One  approach  is  to  have  individual  recog¬ 
nitions  marked  on  a  magnetic  tape  which  passes  under  the  recording  head 
synchronized  with  the  processed  film  as  it  passes  through  the  screening  ap¬ 
paratus.  The  marking  indicates  target  type  and  target  coordinates  in  the  9- 
by  9-inch  frame.  When  the  entire  roll  has  been  processed  on  the  screening 
equipment,  the  roll  and  its  associated  magnetic  record  can  be  placed  on  a 
viewing  table  for  inspection  by  the  photo -interpreter. 
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In  the  design  considered  here,  the  9-  by  9 -inch  film  is  illuminated 
from  the  rear  by  a  diffuse  uniform  light  for  viewing.  At  the  same  time,  the 
prescreened  target  locations  are  indicated  by  brighter  illumination  in  a  local 
area,  and,  if  desired,  different  colors  of  local  illumination  can  be  used  to 
indicate  different  target  types.  Accurately  placed  local  illumination  can  be 
achieved  by  means  of  a  matrix  of  small  light  bulbs  placed  behind  the  film.  A 
particular  bulb  in  a  particular  location  would  be  activated  when  the  digital  co¬ 
ordinates  for  a  detected  target  location  correspond  with  that  particular  bulb. 

A  relatively  small  amount  of  logic  built  into  the  viewing  table  can  give 
the  photo -interpreter  considerable  freedom  of  selection.  For  example,  a 
small  number  of  push  button  controls  can  be  arranged  on  the  viewing  console 
to  make  the  following  options  available  to  the  EL 

•  Advance  the  film  one  frame  at  a  time  whenever  the  advance  button 
is  depressed. 

•  Advance  the  film  to  the  next  frame  not  obscured  by  cloud  cover 
when  the  advance  button  is  depressed. 

c  Advance  the  film  to  the  next  frame  containing  a  detected  target 
when  the  advance  button  is  depressed. 

•  Advance  the  film  to  the  next  frame  containing  a  particular  target 
type  (e.g.,  tank,  truck,  mobile  gun  emplacement,  helicopter,  tent). 

The  following  optional  modes  are  possible  for  concentrating  attention 
in  the  display  upon  particular  targets. 

•  Display  film  with  uniform  illumination,  no  colored  illumination 
for  target  designation. 

•  Display  film  with  all  detected  targets  illuminated  with  their  appro¬ 
priate  color  codes. 

e  Display  film  with  particular  targets  or  combinations  of  targets 
marked  (selection  to  be  made  by  panel  controls). 

For  the  photo -interpreter  to  make  the  most  effective  use  of  the  image 
screening  equipment,  he  must  have  the  option  of  varying  the  recognition  thresh¬ 
old  for  various  target  candidates.  If  the  threshold  is  set  too  low,  the  P.  I.  may 
lose  important  time  searching  through  an  excessive  number  of  false  alarms;  if 
set  too  high,  too  many  targets  may  be  missed.  Thus,  if  hunting  for  a  particular 
target  such  as  an  expected  armor  concentration,  he  might  set  the  threshold 
quite  high  and  quickly  scan  the  available  photography  for  recognitions.  Failing 
to  find  the  required  target  on  the  first  try,  he  has  the  option  of  lowering  the 
threshold  and  trying  again. 


5-10 


One  way  to  achieve  the  versatility  described  above  is  to  make  individ¬ 
ual  threshold  controls  available  at  the  console  of  the  screening  equipment. 

This  technique,  however,  has  the  disadvantage  of  tying  up  the  screening  equip¬ 
ment  while  repeated  scans  are  made  on  the  film  with  different  threshold  set¬ 
tings.  A  better  approach  is  to  record  target  identifications  together  with  a 
digital  number  which  indicates  the  relative  degree  of  certainty  associated  with 
each  identification.  The  threshold  for  target  designation  to  the  P.  L' can  then  be 
implemented  by  logic  contained  within  the  viewing  console.  If  searching  for 
armor,  the  B  I,  then  might  set  red  lights  to  mark  armor  and  blue  lights  to  mark 
possible  tracks.  The  threshold  controls  on  the  viewing  console  would  be  set 
to  a  relatively  high  value  for  both  indicators.  The  film -advance  mechanism 
would  be  set  to  advance  rapidly,  stopping  automatically  on  all  frames  con¬ 
taining  either  of  the  two  chosen  recognitions  with  probabilities  exceeding  the 
preset  threshold  value.  In  each  case,  a  touch  of  the  advance  button  by  theP.  L 
would  advance  the  film  to  the  next  frame  containing  a  recognition  of  the  re¬ 
quired  probability.  If  the  number  of  valid  recognitions  verified  by  the  P.  I.  lies 
within  the  range  of  that  expected,  theJP.  I.  might  elect  to  go  on  to  another  roll  of 
photography;  if  not,  he  might  prefer  to  reinspect  the  film  with  the  designation 
thresholdo  bet  to  respond  to  those  identifications  lying  below  the  previous 
threshold  and  above  a  new  one.  This  process  could  continue  until  the  number 
and  type  of  false  alarms  made  further  scrutiny  seem  unwarranted. 

5.  5 _ Extended  Search  Capability  for  Special  Situations 

A  stored -program  digital  computer  can  be  programmed,  given  suffi¬ 
cient  memory  and  time,  to  solve  very  difficult  pattern  recognition  tasks.  Di¬ 
rect  use  of  the  computer  on  real-time  pattern  recognition  tasks  is  usually 
not  practical  because  excessive  time  is  required  to  process  all  of  the  picture 
data  in  its  many  possible  combinations.  However,  if  the  object  recognition 
system  described  thus  far  is  used  to  screen  the  data  entering  the  computer  so 
that  only  the  more  interesting  and  promising  areas  are  considered,  times  for 
computation  may  fall  within  practical  bounds.  In  general,  the  computer  could 
be  used  in  two  ways. 

First,  local  areas  in  the  photograph,  already  designated  as  probable 
target  objects  by  the  high  speed  recognition  device,  could  be  subjected  to 
further  scrutiny  involving  relatively  elaborate  sequential  routines  in  order  to 
verify  recognition  and  classification.  To  do  this,  both  gray-scale  and  gradient 
data  for  the  candidate  area  could  be  read  into  computer  memory  for  programme 
scrutiny.  Typically,  contours  may  be  followed,  different  thresholds  tried  to 
verify  presence  of  an  obscure  but  necessary  feature,  and  even  local  area 
Fourier  spectrum  components  measured.  Computation  time  would  be  relatively 
short  in  this  process  for  the  following  reasons: 

•  Object  has  been  pre -centered  and  pre -rotated  in  field  by  previous 
registry  testing  recognition  logic. 
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•  Only  those  logical  tests  necessary  for  verifying  the  existence  of  a 
particular  object  need  usually  be  tried.  If  recognition  is  not  veri¬ 
fied  then  other  tests  for  the  next  most  probable  identifications  can 
follow. 

A  second  way  in  which  the  computer  may  be  used  is  in  the  recognition 
of  target  complexes.  Verified  individual  object  recognitions  and  their  coor¬ 
dinates  may  be  considered  jointly  in  their  various  combinations  in  order  to 
recognize  object  complexes  such  as  missile  launching  sites,  or  multiple  gun 
emplacements.  In  this  mode  of  operation  the  computer  program  could  con¬ 
sider  the  various  detected  objects  within  a  given  area  as  a  property  list  and 
then  hypothesize  a  candidate  target  complex.  Further  verification  could  then  • 
take  place  with  the  scanner  under  the  command  of  the  computer.  For  example, 
the  scanner -computer  combination  could  be  used  to  attempt  to  find  and  trace 
paths  or  roads  joining  gun  emplacements  and  possible  ammunition  or  fueling 
depots.  Such  sequential  techniques  are  essential  when  the  total  complex  en¬ 
compasses  a  greater  number  of  picture  elements  (in  the  resolution  required 
for  recognition)  than  can  be  held  iffi-memory  at  any  one  time. 

5.6 _ Control  Console  and  System 

The  following  adjustments  can  reasonably  be  expected  to  be  available 
at  the  control  console: 

•  Classes  of  target  objects  to  be  included  in  the  search 

•  Scale  factor  adjustment  for  the  photograph  plus  the  size  of  the 
scanning  pattern 

•  Search  time  to  be  spent  per  frame 

•  False  alarm  rate  and  detection  probability  by  threshold  adjustment 

To  permanently  wire  all  feature  and  object  recognition  logic  and 
weights  for  all  the  target  classes  likely  to  be  used  in  screening,  would  lead  to 
almost  prohibitive  complexity.  The  equipment  may  well  be  constructed  with 
plug-in  capability  such  that  the  same  basic  cross -correlator,  threshold  cir¬ 
cuits,  and  digital  logic  can  be  used  for  a  wide  variety  of  tactical  screening 
problems.  Thus,  tjie  first  layer  feature -template  weights  (see  Figure  5-1), 
the  second  layer  decision  weights,  and  the  output  code  used  to  record  the  iden¬ 
tifications  should  all  be  subject  to  change  by  means  of  plug-in  modules.  In 
the  case  of  the  optically  tapped  delay  line  cross -correlator,  the  first  layer 
weights  are  readily  changed  by  changing  a  number  of  photographic  slides.  The 
second  layer  weights  may  involve  4G  weights  per  object  class  per  angle  of  ro¬ 
tation.  A  separate  plug-in  resistor  matrix  for  each  angle  of  rotation  would 
represent  a  practical  solution  to  the  problem. 
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When  the  size  of  all  target  objects  for  screening  are  roughly  the  same, 
it  is  necessary  only  to  select  a  single  scanning  pattern  size  determined  by  the 
scale  of  the  photography  to  be  processed.  When  the  altitude  of  flight  is  approx¬ 
imately  fixed  for  the  roll  of  film  to  be  processed,  the  scale  factor  can  be  set 
at  the  console.  If,  however,  altitude  is  expected  to  vary  significantly  over  the 
roll,  then  it  becomes  necessary  to  mark  scale  factor  at  frequent  intervals  on 
the  margin  of  the  film.  This  can  be  done  manually  using  a  simple  digital  code 
punched  into  the  edge  of  the  film,  or  it  may  be  done  automatically  during  the 
flight.  In  either  case,  the  punched  marking  can  be  read  off  the  film  edge  auto¬ 
matically  and  used  to  set  the  scale  factor  of  scan. 

If  H  is  the  greatest  scanning  size  to  be  used,  then  other  scanning  sizes 
are  chosen  to  be  integer  fractions  of  H  (H^,  H/g).  For  the  larger  scans, 
correspondingly  larger  horizontal  pitches  of  the  sawtooth  scan  and  larger  spot 
sizes  are  also  used.  During  any  particular  scan,  only  those  object  recogni¬ 
tion  circuits  pertaining  to  that  particular  scale  factor  are  permitted  to  record 
recognition  decisions.  If  scales  for  scanning  vary  by  factors  of  two,  the  in¬ 
creased  time  required  for  all  scales  larger  than  the  smallest  is  only  4/3  that 
required  for  the  smallest.  When  the  height  and  pitch  are  doubled,  the  density 
of  picture  element  samples  is  1/4  of  the  original  value,  and 

i 

n  =  0 

Depending  upon  the  amount  of  imagery  to  be  processed  and  the  re¬ 
action  time  required,  additional  scans  may  or  may  not  be  desirable.  There¬ 
fore,  it  should  be  possible  for  the  equipment  to  process  photography  in  two 
scanning  modes:  a  fast  mode  in  which  each  scale  factor  is  covered  only  once, 
and  a  slow  mode  in  which  the  image  is  scanned  several  times  at  each  major 
scale  factor.  The  choice  between  these  modes  will,  of  course,  be  made  from 
the  control  console  as  determined  by  the  photo -interpreter. 

If,  due  to  economic  compromise,  the  number  of  scanning  lines  and 
video  sampling  points  is  less  than  the  theoretical  minimum  required,  a  kind 
of  noise  results  in  the  recognition  circuits  (sometimes  called  quantization 
noise)  which  produces  different  recognition  results  on  nearly  identical  scans. 
The  effect  of  this  noise  is  reduced  by  taking  several  scans  at  incrementally 
different  sizes  and  averaging  the  outputs  of  the  discriminant  function. 

The  scale  for  particular  photography  is  known  only  within  a  specified 
range.  Scans  at  several  incrementally  different  scale  factors  will  insure 
coverage  of  the  scale  factor  at  which  the  recognition  logic  most  nearly  matches 


5-  13 


the  input  imagery.  This  procedure  of  size -search  also  helps  to  compensate 
for  slight  variations  in  object  size. 

To  reduce  film  processing  time,  the  equipment  could  be  programmed 
on  the  detection  of  cloud  cover,  to  sample  several  other  areas,of  the  frame 
rapidly  for  cloud  cover  using  only  the  largest  scanning  frame.  If  all  samples 
were  recognized  as  cloud  cover,  the  equipment  would  proceed  to  the  next 
frame  without  further  scanning.  A  similar  decision  could  be  made  for  water. 
Each  of  these  modes  could  be  used  or  not  used  according  to  the  decision  of  the 
individual  operating  the  equipment.  For  example,  when  searching  for  ships, 
the  water -reject  mode  would  not  be  appropriate.  Selection  of  mode  of  oper¬ 
ation  is  made  from  the  control  console. 

5.  7 _ Estimate  of  System  Complexity 

No  firm  estimate  of  complexity  of  an  effective  image  screening 
equipment  can  be  made  until  a  number  of  critical  experiments  have  been  com¬ 
pleted.  These  experiments  are  required  to  determine  the  following  key 
parameters 

•  The  number  of  stored  picture  elements  necessary  to  achieve 
recognition;  present  estimate  is  2000. 

•  Number  of  picture  elements  which  need  to  be  weighted  per 
feature:  present  estimate  is  10  to  50. 

•  Number  of  separate  feature  detectors  needed  to  recognize  an 
object  class  in  one  small  range  of  angles;  present  estimate  is  40. 

•  Number  of  duplicate  feature  sets  needed  to  account  for  rotation; 
present  estimate  is  18. 

•  Number  of  features  required  to  represent  adequately  a  vocab¬ 
ulary  of  20  objects;  present  estimate  is  6  times  that  required 
for  one  object. 

In  spite  of  the  fact  that  the  foregoing  parameters  have  yet  to  be 
determined  more  closely,  a  preliminary  estimate  of  complexity  based  on  the 
above  figures  may  be  developed.  Consider  an  image  screening  system  using 
optically  tapped  acoustic  delay  lines  and  capable  of  identifying  20  objects  of 
tactical  significance.  The  equipment  required  to  realize  such  a  system  is 
shown  in  summary  form  in  Table  5-1,  "Estimate  of  Screening  System  Com¬ 
plexity.  " 
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TABLE  5-1 


ESTIMATE  OF  SCREENING  SYSTEM  COMPLEXITY 


Requirement  or  Assumption 

Equipment 

200  psec  delay  line*  0.1  psec  rise  time. 

Use  two: one  for  gradient,  one  for  gray¬ 
scale 

2  delay  lines 

4320  feature  detectors 
[  6  x  18  x  40] 

Each  feature  detector  has  5  transistors, 

1  photomultiplier,  50  optical  fibers 

21,600  transistors 

4, 320  photomultipliers 
216,000  optical  fibers 

10  •  9  +  i0  -  18  =  270 

Discriminant  functions  are  required  on 

40  pairs  of  variables.  10,800  AND 
gates,  3  diodes  each 

32,400  diodes 

270  thresholds,  5  transistors  each 

1 , 350  transistors 

Ten  9 -legged  OR  gates  plus 

Ten  18-legged  OR  gates 

270  diodes 

Miscellaneous  scanning  and  coding  logic 

1 , 000  transistors 

Viewing  monitor; 

32  x  32  light  bulb  matrix  in  five  colors 
plus  1 , 000  transistor  logic 

5,120  miniature  bulbs 

320  relays 

1,000  transistors 
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SECTION  6 


STATISTICAL.  METHODS  FOR  PATTERN  CLASSIFICATION 


6 . 1  Introduction 


This  section  presents  the  main  points  concerning  statistical  classi¬ 
fication  procedures  and  procedures  for  examining  recorded  data  from  the 
point  of  view  of  significance  of  predictor  variables  and  configuration  of  groups 
in  an  N-dimensional  space.  These  items  are  explained  in  greater  detail  in 
Appendix  H. 

Intuitive  classification  procedures,  based  on  concepts  of  distances 
and  directions ,  and  employing  transformations  of  the  coordinate  space  or 
projections  of  the  samples  along  a  particular  direction,  were  used  from  the 
very  beginning  of  multivariate  analysis.  In  1939,  a  probabilistic  theory  of 
classification  first  appeared  when  attention  was  focused  on  procedures  which 
would  minimize  the  probability  of  misclassification.  Present  classification 
procedures  represent  a  synthesis  of  ideas  of  distance  functions  or  metrics, 
with  probabilistic  ideas  such  as  minimizing  the  probability  of  misclassjfica- 
tion  or  minimizing  the  expected  loss  of  misclassification.  L  2, 3, 4 

The  basic  assumption  underlying  probabilistic  classification  is  that 
there  exists  for  each  group  Xg,  g  =  1,  2,  .  .  ,  G,  a  multivariate  probability 

distribution  Fg  (xj ,  x2»  .  .  .  ,  x-^).  Members  of  a  pattern  class  are  then 
treated  as  samples  from  a  population,  which  i  e  distributed  in  N-dimensional 
space  according  to  the  distribution  associated  with  that  population.  This 
theoretical  framework  leads  to  three  types  of  problems  which  encompass 
situations  ranging  from  complete  statistical  knowledge  of  the  distributions, 
to  no  knowledge  except  that  which  can  be  inferred  from  samples.  The  vari¬ 
ous  situations  are  discussed  individually  in  the  following  text. 

1.  C.R.  Rao,  Advanced  Statistical  Methods  in  Biometric  Research,  John  Wiley 
and  Sons,  New  York,  1952. 

2.  S.S.  Wilks,  "Multidimensional  Statistical  Scatter, 11  Contributions  to  Proba¬ 
bility  and  Statistics  in  Honor  of  Harold  Hottelling,  Stanford /U.  Press,  19 6(f. 

3.  Maurice  M.  Tatsuoka,  David  V.  Tiedman,  "Discriminant  Analysis. 11  Rev. 
Education  Research.  Vol  24,  pp  402-420,  1954. 

4.  R.G.  Miller,  "Statistical  Prediction  by  Discriminant  Analysis, 11  Meteoro¬ 
logical  Monographs,  Vol  4,  No.  25,  October  1962,  Amer.  Meteorological 
Society,  Boston,  Mass. 
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6.2 


Case  of  Known  Distributions 


Assuming  that  the  probability  density  functions  £g  (x^ ,  x^,  ....  x^), 
g  =  1,  2,  .  .  .  ,  G,  are  known,  having  a  new  pattern  to  classify  is  the  same 
as  having  a  new  observation  on  the  stochastic  variable  X-=-(Xj,.X2>  .  .  .  ,  Xj^). 
One  must  decide  -from  which  of  the  G  distributions  this  particular  observa¬ 
tion  arose.  Several  variations  can  occur  within  this  case. 

The  simplest  situation  occurs  when  one  has  only  two  groups  whose 
probability  density  functions  f  j(x)  and  f£(x)  are  known,  and  the  new  pattern 
must  belong  to  one  of  these  two  groups. 

In  this  situation  optimal  classification  is  obtained  by  using  the  likeli¬ 
hood  ratio  L(x)  =  fj(x)/f2(x).  The  value  of  x  =  (xj,  x^,  ....  Xjq)  for  a  new 
pattern  is  substituted  in  L(x)  and  the  result  compared  with  a  threshold  t.  If 
L(x)  exceeds  t,  the  new  pattern  is  classified  as  belonging  to  Group  1.  Other¬ 
wise,  the  pattern  is  classified  into  Group  2.  The  choice  of  the  threshold  t 
depends  on: 

1.  The  criterion  of  optimality  being  used. 

2.  The  degree  of  knowledge  of  the  a  priori  distribution  of  the 
density  functions ,  i .  e .  ,  the  proportion  q j  :  q2 ,  q^  +  *1^  =  *  * 
of  the  two  groups  in  thejjniverse  from  which  patterns  are 

_  drawn.  _ 


3.  The  costs  of  correct  and  incorrect  classification. 

Some  of  the  decision  criteria,  such  as  Bayes  and  Maximum  Likeli¬ 
hood,  which  have  been  used  to  obtain  values  for  the  threshold  t  are  presented 
in  Appendix  H. 

If  fj(x)  and  f2(x)  are  multivariate  normal  distributions  with  different 
mean  (column)  vectors, 


M(g>  =  (mjfc), 


m. 


(g) 


m 


(g) 


N 


).  g  =  1,2, 


and  covariance  matrices 


Vg,  g  =  1,  2, 
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then  the  logarithm  of  the  likelihood  ratio  leads  to  a  linear  function  of  the 
observables  when 


Vi  =  v2 


and  to  a  quadratic  function  of  the  observables  when 


V1  *  V2  ' 

Thus  when 

Vi  =  v2  =  v. 


one  has  in  matrix  notation: 


log  L(x)  =  x'  V-1  (M(1)  -  M(Z))  -  1/2  (M(1)  +M(Z))'  V_1(M(I)  -  M<2)) 

(6-1) 


where  x'  represents  the  transpose  of  the  column  vector  x  =  (xj.x^,  ....  x^) 
of  measurements  madfe  on  the  pattern  which  is  to  be  classified; 

V"1  is  the  inverse  matrix  corresponding  to  the  (common)  covariance 
matrix  V,  which  is  a  N  x  N  matrix  with  elements 


v ij  =  E  [  (Xi  -  m.)  (xj  -  mj)  ]  ; 
is  the  mean  vector  far  Group  1; 

(2) 

is  the  mean  vector  for  Group  2. 

If  one  lets: 


C  =  log  t  +  1/2  (M(1)  +  M(2) )'  V”1  (M(1)  -  M(Z) )  (6-2 
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the  likelihood  ratio  principle  in  this  case  leads  to  the  classification  of  the  new 
pattern  into  one  of  two  regions  Rj.  and  R2  in  the  N -dimensional  space  separated 
by  a  hyper  plane  given  by 


x'  V"1  (M(1)  -  M(2>  )  =  C 


or  in  expanded  form  by  the  equation 
N 

V  /  .  2i  ,  Ni 

L  (v  d!  +  v  d2  +  .  .  .  v  djj)  xj  =  C 


(6-3) 


i=l 


where  v*1,  j  =  1,2,  .  .  .  ,N ,  are  the  elements  of  the  matrix  V,  and 

di  =  m.(1*  -  m.^2) ,  i 
3  3  J 

groups.  If  one  lets: 


di  =  m.^  -  m.^2) ,  the  difference  between  the  mean  values  of  x-  in  the  two 
J  3  j  J 


li  ,  2i  Ni 

aj  =  v  dx  +  v  q2  +  .  .  .  +  v  aN 


(6-4) 


then  Equation  6-3  can  be  rewritten  as 


i=l 


(6-5) 


The  constant  C  in  Equation  6-5  and  the  a±  of  Equations  6-4  and 
6-5  depend  only  on  the  known  meant!  and  covariances  in  the  two  groups.  If, 
when  substituting  the  values  of  Xj  measured  on  a  new  pattern  in  Equation  6-5, 
the  left-nand  side  exceeds  C,  the  pattern  in  classified  into  Group  1.  If  the 
left-hand  side  is  less  than  C,  the  pattern  is  classified  as  belonging  to  Group  2. 
If  the  left-hand  side  equals  C,  the  decision  is  arbitrary. 


If  the  covariance  matrices  for  the  two  groups  are  not  equal,  i.e. , 
Vj  /  V2,  then  instead  of  the  hyperplane  of  Equation  6-5,  the  optimum 
decision  boundary,  given  by  the  surface  of  a  constant  likelihood  ratio  is: 


£  |vllj  <*i  -  mi(1)>  <xj  *  mj(l))  "  v2lj  <*i  "  mi(2))  <xj 


=  constant 


(6-6) 
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where  v^  and  are  the  elements  of  Vj-1  and  respectively, 

tion  6-6  may  be  written  in  the  form 


N  N 

2  »« *iv +  E  b‘x‘ 


constant 


i=l 


i=l 


j=l 


Equa- 


(6-7) 


where 


and 


(6-8) 


■  .  (1)  il  (2)  il  ,  (1)  i2  (2)  i2 

=  2  ^  (mj  vx  -  mi  v2  )  +  (m2  vx  -  m2  v2  )  + 


.  ,  (1)  iN  (2)  iN 

. . .  +  (mN  vx  -  mN  v2  ) 


(6-9) 


The  expressions  of  Equations  6-5  and  6-7  will  be  referred  to  as  linear 
and  quadratic  classification  functions.  They  can  be  shown  to  be  theoretically 
optimal,  with  different  values  for  the  coefficients,  for  a  number  of  types  of 
density  functions,  in  addition  to  normal  density  functions.  Interest  in  linear 
and  quadratic  classification  functions  stems  also  from  considering  them  as 
first-  and  second-order  approximations  to  arbitrary  likelihood  ratios,  since 
in  many  situations  they  represent  the  most  that  can  be  realized  in  hardware, 
or  by  computation.  As  an  example,  *  for  the  case  where  the  x£  are  binary  vari¬ 
ables,  it  is  shown  in  Appendix  H  that  the  likelihood  ratio  of  two  distributions  of 
N  binary  variables  becomes: 


1.  R.  R.  Bahadur,  "On  Classification  Based  on  Responses  to  n  Dichotomous 
Items,  "  USAF  SAM  Series  in  Statistics.  Randolph  AFB  Texas,  1959;  also 
appears  in  Studies  in  Item  Analysis  and  Prediction.  Herbert  Solomon  (ed. ), 
Stanford  University  Press,  1961. 
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L(x)  = 


fi  iw‘>rv-,“>i 

i  =  1 


n*1  "xi) 


N 


/l  {i-mi<2,)U'3Ci> 

i  =  1 


1  + 


v  (!)  (1)  (!)  V  (*)  t1)  t1)  (!)  (1)  W  W  d 

L  rij  yi  yj  +  L  rijk  yi  yj  Vk  +•••  +r12...Nyi  Y2  Yfi 
i  <j  i<j<k 


,  .  V  (2)  (2)  (2)  r-  (2)  ( 

1  +  L  rij  yi  yj  +  L  rijk  n 

i  <  j  i<j<k 


(2)  (2)  (2)  (2) 


ijk  ^  yj  yk  +•* 


(2)  (2)  (2)  (2 

•+ri2...Nyi  y2---yN 

(6-10) 


(g)  _  /„  (g)  _  J  \  ,  A  >  (g )  -  1  ,  —  1  ? 

xYij  ”  Ok/«  |  f  0  ^  I2*j  ^  1  j  ^  1}  4  s 


ySe) 


=  (Xj  -  (1  -  ; 


i  =  1,2..  .N; 

g  =  1,2. 


(g)  ,  (g)  (g),  .  .  . 

rij  =  Eg  (yi  yj  >  '  1  <  J 


4k  -  Eg  ‘n18’  yj<8)  yk<8)  i  •  *  <  i  < k  • 


(6-11) 


r(g)  =  E  (vi  ^  v  ^  v  ^  ) 
r12. .  .  N  "g  'yl  y2  •  •  *  yN  ' 

with  E  denoting  the  expectation  taken  with  respect  to  the  probability  function 
of  group  g,  g  =  1,2.  The  r^j,  r..^  .  .  .  are  the  "correlation  parameters. "  A 
second  order  approximation  to  the  logarithm  of  L(x)  of  Equation  6-10  (see 
Appendix  H)  is 

^  r  r-  (1)  (1)  (1)  I 

log  L(bc)  =  ^  (ai  ^  +  ci)  +  log  1  +  rij  yi  yj 

i=l  i<j  J 


log 


■♦E 

i<j 


(2)  (2)  (2) 
rij  yi  yj 


(6-12) 
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where 


ai  =  log 


and 


m4 


(1) 


(1  -  ) 


(l-m.(l)) 


=  log 


U  -  m,"’) 

<l-m4(2>, 


when  the  "correlation  parameters"  are  small  one  can  obtain  the  further 
approximation 


log  L(x)  = 

i<j 


a 


ij 


X.  X; 
1  1 


bi  xi 


(6-13) 


where 


with 


and 


(1)  (2) 

%  =  uij  '  uij 


t  (8)  _  r  <8)  / -J  (8)  (g)  (g)  (g)  _  . 

—  r^j  /  V  in^  (1  *  nii  )  ixij  (1  -  nij  )  >  S  "  *» “» 

(1)  (1)  (2)  (2) 


bi 

=  a.  + 

2 

if 

•  (  -  m. 

t  * 

Ui. 

i  +™j 

uij 

)  . 

with 

ai 

=  log 

' 

__  (!) 
mi 

(1 

1 

3 

_ 1 

m^2) 

(1 

m  (IK 

-m.  ) 

From  Equation  6-10  it  is  apparent  that  if  the  number  of  variables  N  is  moder¬ 
ately  large,  a  second  order  approximation  as  represented  by  Equations  6-12 
or  6-13,  is  as  much  as  one  could  hope  for  in  general,  in  the  case  of  binary 
variables.  This  does  not,  of  course,  prevent  consideration  of  selected  higher 
order  correlations  in  the  approximation. 
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For  the  case  where  we  have  more  than  two  groups,  if  a  priori  proba¬ 
bilities  and  costs  of  classification  kjg  are  known,  then  according  to  the 
Bayes*  criterion,  the  G  functions 


G 

R(j)  =  £  fg  (x)  •  qg  *  kjg  .  j  =  1,2  ...  G.  (6-14) 

g=l 


need  to  be  computed.  The  new  pattern  is  assigned  to  that  group  j  for  which 
R(j)  is  smallest.  If  all  a  priori  probabilities  and  costs  are  taken  to  be  equal, 
then  for  a  given  x,  one  need  only  compare  f  (x),  g  =  1, 2,  . . . ,  G  and  pick 
that  group  g  for  which  fg(x)  is  largest.  In  general,  partitioning  the  N-dimen- 
sional  space  into  G  regions  Rg,  g  =  1,2,  .  .  . ,  G  requires  the  determination  of 
G(G-l)/2  boundaries  given  by  one  likelihood  ratio  for  each  pair  of  groups. 


6 .  3  Case  of  Parametric  Families  of  Distribution 


If  a  classification  procedure  is  optimal  with  respect  to  certain 
criteria  when  the  functional  form  and  all  parameters  of  the  distribution  are 
known,  then  how  "good"  are  parameters  estimated  from  samples,  and  what 
sort  of  estimates  should  be  used  to  obtain  a  good  classification  procedure? 

The  performance  of  a  classification  procedure  using  estimated 
parameters  should  be  consistent  with,  i.e.,  tend  to,  the  optimal  performance 
obtained  when  true  parameter  values  are  used.  Similarly,  estimators  are 
said  to  be  consistent  if  the  estimated  value  of  a  parameter  approaches  the 
true  value  of  the  parameter  with  probability  1  as  the  sample  size  is  increased 
indefinitely.  The  consistency  property,  as  defined  here,  will  be  referenced 
frequently  in  the  discussion  which  follows. 

For  classification  into  one  of  two  groups,  it  has  been  shown*  that 
if  f  i(x,  6)  and  fgfc,  $)  satisfy  certain  weak  restrictions  as  to  continuity  with 
respect  to  the  parameter  6,  then  using  consistent  estimates  instead  of  true 
parameter  values  in  the  likelihood  ratio  procedure  will  lead  to  a  consistent 
classification  procedure.  For  classification  of  a  multivariate  observation 
into  one  of  G  groups  whose  density  functions  are  known  except  for  a  number 


1.  E.  Fix,  J.  L.  Hpdges,  "Discriminatory  Analysis:  nonpar ametric  dis¬ 
crimination:  consistency  properties,  "  Report  No.  4,  USAF  School  of 
Aviation  Medicine ,  Randolph  Field,  Texas,  1951. 
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of  parameters,  if  consistent  estimators  are  used  for  the  a  priori  probabili¬ 
ties  q  and  for  the  unknown  parameters,  it  has  been  shown*  that  the  Bayes 
procecuire,  with  unknown  parameters  replaced  by  estimates,  is  a  consistent 
classification  procedure. 

When  sample  sizes  are  finite,  and  especially  when  sample  sizes  are 
small,  the  use  of  maximum  likelihood  estimates  of  parameters  in  a  likelihood 
ratio  procedure  or  Bayes  procedure,  in  place  of  the  unknown  true  parameter 
values  can  only  be  justified  on  heuristic  grounds.  Nevertheless  it  is  usually 
done .  Thus  for  the  case  of  two  normal  density  functions  with  unknown  mean 
vectors  and  covariance  matrices,  one  has,  instead  of  Equation  6-1,  the 
following  expression: 


log  L(x)  =  x'S'1  (x(1)  -  x(2))  -  1/2  (#'  +  x{“')  S"1  (x{1>  -  x  (2)),  (6-15) 


where 


x*«>  =  (X!^  ,  x2(8> 


9  •  •  •  ) 


Xp/**  )  ,  g  =  1 , 2  ; 


xl*)  =  — 


-  y 


(g) 


xir  ”  *1  =  1,2...  ,  N  }  g  =  1,2, 


(6-16) 

(6-17) 


r=l 


and  the  elements  of  3,  the  estimate  of  the  covariance  matrix  V,  assumed 
equal  for  the  two  groups  is  obtained  by  pooling  samples  from  the  two  groups. 
The  elements  of  S  are  given  by: 

ni  n2 


“ij  = 


r  ,  w  -U)..  a)  <2>  Jz\ 

fcir  -  *i  )  (xjr  "  xj  )  *  /  .  ^*ir  "  xi  )  (x|t  "  ) 

r=l  ril 


n,  + 


1  +nZ 


(6-18 


For  the  multigroup  case,  if  the  costs  k,  are  known  or  can  be 
reasonably  assigned,  the  expression  in  Equation  b- 14  becomes 


R'U) 


g 


fg(*)  •  %  *  kjg  i  j  =  1>Z . G 


(6-19) 


1.  P.  G.  Hoel  and  R.  P.  Peterson,  "A  Solution  to  the  Problem  of  Optimum 
Classificatioh, "  Ann.  Math.  Statistics,  Vol  20,  1949. 
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where 


g 


g  G 


!  8  =  1>  2,  >  •  •  i  G. 


E> 

g=i 


(6-20) 


and  fg(x)  is  obtained  by  substituting  maximum  likelihood  sample  estimates 
for  true  parameters  in  f_(x) .  Thus,  for  example,  if  the  fg(x)  are  assumed 
to  be  multivariate  normal  density  functions  with  equal  covariance  matrices, 


A 

fg(*) 


(x  -  x<gY  S"1 


) 


(6-21) 


where  represents  the  (column)  vector  given  in  Equation  6-16  with  ele¬ 
ments  given  by  Equation  6-17  for  g  =  1,  2,  .  .  .  ,  G,  and  S-*  represents  the 
inverse  of  the  matrix  S,  which  by  analogy  with  Equation  6-18  has  elements 


G 

r 

& 


n, 


g 


J. 

“I  ’ij 


ng 

V 

g=l  r=l 


VI 

■E 


/v. 
1~1T 


(g)  -  V,(8>,  (  J<>  .  *<*>, 

J 


(6-22) 


i,  j  =  l,  2, 


N 


If  the  covariance  matrices  are  not  assumed  to  be  equal  for  the  various  groups, 
then,  instead  of  obtaining  one  estimate  by  pooling  ail  samples  it  will  be 
necessary  to  obtain  separate  estimates  Sg  with  elements 


n, 


(g)  . 


(g)  -  (g).  .  (g)  -  (g) 


- 7”  Y  kir"*'  -  xi°)  (*jr'°'  ”  >  B  "  1.  z>  •  •  •  .G.  (6-2 

na~  1  r=I 


When  the  forms  assumed  for  the  density  functions  f_(x)  are  such  that 
maximum  likelihood  estimates  of  parameters  are  computationally  difficult  to 
obtain,  the  method  of  moments  is  used  to  estimate  parameters  from  sample 
moments.  This  method  does  not  posses  the  optimum  property  employed  by  the 


maximum  likelihood  estimator,  but  may  have  to  be  resorted  to  in  some  situa¬ 
tions,  as  for  example  when  the  functional  forms  for  fg(x)  are  assumed  to  be 
multivariate  versions  of  the  Pearson  system  of  frequency  curves.  ^ 


1.  M.  G.  Kendall,  The  Advanced  Theory  of  Statistics,  Vols  1  and  2,  Hafner 
Publishing  Co. ,  1948. 

2.  W.  P.  Elder t™-..  Frequency  Curves  and  Correlation,  Cambridge  University 
Press,  1938. 
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6.4 


The  Nonparametric  Case 


When  the  functional  forms  of  fg(x)  are  not  known,  instead  of  estimating 
parameters,  one  must  estimate  the  conditional  group  probabilities  directly. 

The  results  obtained  with  such  methods  are,  of  course,  inferior  to  a  likelihood 
ratio  procedure  using  known  density  functions.  The  nonparametric  case 
represents  the  usual  practical  situation  in  many  problems  of  classification. 

It  also  represents  the  case  which,  because  of  obvious  difficulties,  has  received 
the  least  attention  to  date. 

Having  available  ng  samples  from  group  g,  g  =  1,  2,  . . . ,  G,  one  can 
consider  procedures  for  determining  which  of  a  number  of  assumed  forms  for 
the  distribution  fg(x)  best  fits  the  samples  from  group  g.  Directly  estimating 
fg(x),  which  for  a  given  new  observation  x  represents  real  numbers,  and 
comparing  a  new  pattern  with  known  samples  from  each  group,  one  may 
determine  which  set  of  samples  the  new  pattern  must  resemble  closely. 

Here  it  is  appropriate  to  consider  a  nonparametric  procedure  presented 
by  Fix  and  Hodges  *  for  directly  estimating  the  value  of  the  density  functions 
£g{x)  at  a  new  observation  x  =  {xj,  X£,  ,  .  . ,  xj^  ). 


Consider  the  case  of  two  groups.  Since  substitution  of  a  new  obse*  vation 
into  the  density  functions  results  in  the  real  numbers  fg(x),  g  =  1,2,  once 
estimates  of  these  two  real  numbers  are  obtained  they  may  be  substituted  into 
the  likelihood  ratio.  Fix  and  Hodges  presentaprocedurewhichprovides  consistent 
estimates  for  the  two  real  numbers  fg(x) .  The  likelihood  ratio  procedure  using 
these  estimates  provides  consistent  discrimination  only  if  the  sample  sizes  are 
large.  For  practical  situations  they  suggest  the  following  intuitive  procedure: 

In  place  of  fj(x)  and  f2(x)  use  Qj/nj  and  C^/n^  respectively,  and  compare 


L  = 


Ql/nl 
^2  ^n2 


with  a  threshold,  t.  Here  ng,  g  =  1,  2  represents  the  sample  size  in  group  1 
and  group  2,  and  Qj  and  Q2  are  obtained  as  follows: 


1.  E.  Fix,  J.  L.  Hodges,  "Discriminatory  Analysis:  nonparametric  discrimina¬ 
tion:  consistency  properties,  "  Report  No.  4,  USAF  School  of  Aviation 
Medicine,  Randolph  Field,  Texas,  1951, 
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A  positive  integer  K  is  selected  which  is  large,  but  small  compared 
to  the  size  of  the  sample.  An  appropriate  metric  is  imposed  on  the  sample 
space.  All  samples  from  all  the  groups  are  pooled  together  and  of  the  K  values 
in  the  pooled  sample  which  are  "nearest"  to  the  new  point  x,  Qj  represents  the 
number  from  group  1  and  Q2  =  K  -  Qi  represents  the  number  from  group  2. 

Then  Qj/nj  and  C^/nz  are  estimates  of  fj(x)  and  f2(x)  respectively.  This  method 
of  estimating  the  real  numbers  fg(x)  corresponding  to  a  new  observation  carries 
over  to  the  multigroup  case.  Note  that  the  concept  of  nearest  samples  is  stated 
in  terms  of  an  appropriate  metric,  but  that  classification  is  performed  by  using 
estimates  in  a  likelihood  ratio  or  Bayes  procedure.  This  matter  is  discussed 
in  more  detail  in  the  following  paragraphs . 

6. 5 _ Distance,  Direction,  and  Significance 

Distance  between  a  new  pattern  and  the  mean  of  a  group,  between  two 
groups,  between  samples  of  two  groups,  and  between  a  new  pattern  and  samples 
of  a  group  is  often  defined  in  terms  of  a  Euclidean  distance  function  either  in 
original  space  of  the  N  predictor  variables  or  in  a  space  obtained  by  a  suitable 
coordinate  transformation.  Knowing  that  a  pattern  or  a  sample  belongs  to  one 
of  a  number  of  populations,  if  we  can  measure  the  distance  of  the  pattern  or 
sample  from  each  of  the  several  populations  then  it  is  reasonable  to  assign  the 
pattern  to  that  population  from  which  it  is  least  distant. 

A  distance  function  which  has  received  attention  in  discriminant 
analysis  is  the  Mahalanobis  Generalized  Distance  denoted  by  D2.  For  the  case 
where  there  are  nj  samples  from  one  group  and  samples  from  a  second 
group,  the  generalized  distance  between  the  two  samples  is 

N  N 

Dn2  =  2  2  (3Ci' -  2Ci^2^)  (aCj^1^  -  )  (6-24) 

i=1  j=1 

where 

N  represents  the  number  of  characteristics  measured  on  each 
pattern; 

sij  are  the  elements  of  S”  which  is  the  inverse  of  the  matrix  S 
whose  elements  are  given  by  Equation  6-18. 

As  Rao,  previously  referenced,  has  stated,  D2  represents  ordinary  Euclidean 
distance  in  a  space  defined  by  a  set  of  oblique  axes. 
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If  one  defines 


*  -  l*,'1'  -  -i121) 


(6-25) 


then  Equation  6-24  becomes 

N  N 

dn2  =  2  2  sjl<iidj 
i=l  j=l 

If  one  further  defines 


(6-26) 


G  ng 


w 


V  V  /  te)  _(g)  w  (g)  _(g), 

ij  =  Z  Z  (Xir  "  Xi  *  (  jr  "  4  ] 


Jr 


i=l  r  =1 
g 


(6-27) 


and  lets  d  denote  the  column  vector  (d^,  d^,,  ...  ,  dn),  then  one  can  rewrite 
Equation  6-26  as 


Dn2  =  (nx  +  n2  -  2)  d'W^d 


(6-28) 


d'  denotes  the  transpose  of  d; 

W  represents  the  matrix  whose  elements  are  w-  . 

Vi  is  sometimes  called  the  within- samples  scatter  matrix.  This 
terminology  is  explained  in  Appendix  H. 

2 

In  order  to  use  a  statistic  such  as  D  it  is  necessary  to  know  its 
distribution  under  the  various  situations  which  can  occur;  viz. ,  when  the  para¬ 
meters  used  in  the  statistic,  such  as  means  and  covariances,  are  true  popu¬ 
lation  parameters  or  sample  estimates,  these  two  situations  being  referred  to 
as  the  "classical"  and  "studentized"  cases  respectively,  when  the  character¬ 
istics  are  independent  or  correlated  and  when  the  populations  are  actually  the 
same  or  differ  in  at  least  one  of  the  N  variables.  These  latter  situations  are 
referred  to  as  the  central  and  noncentral  cases  respectively.  The  solution  of 
the  distribution  problem  for  tests  such  as  is  a  difficult  one  and  all  available 
solutions  assume  that  the  samples  come  from  multivariate  normal  populations 
with  equal  covariance  matrices.  The  application  of  such  tests  to  situations  in 
which  these  assumptions  cannot  be  reasonably  made  is  thus  difficult  to  justify. 
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To  test  if  the  predictors  contain  information  for  discrimination,  it  is 
necessary  to  test  the  hypothesis  that  the  samples  actually  come  from  the  same 
group  so  that  Djj2  would  differ  from  zero  only  because  of  the  variability  within 
a  group.  Thub  the  null  hypothesis  is  that  the  expectation  of  the  vector  d,  defined 
above,  is  zero.  The  statistic  1 


(nj  +  n2  -  N  -  1) 
(ni  +  n2  -  2) 


aln2 

nln2 


(6-29) 


is  in  this  situation,  i.  e. ,  the  central  case,  distributed  according  to  the  variance 
ratio  or  F  distribution  with  N  and  (nj  +  n2  -  N  -  1)  degrees  of  freedom.  This 
latter  terminology  arises  from  the  fact  that  if  s^2  is  the  sample  estimate  of  the 
population  variance  of  aunivariate  normal  distribution  based  on  a  sample  of  size 
n,  and  s2^  is  an  independent  estimate  based  on  a  sample  of  size  n2,  then  the 
distribution  of  the  ratio 


(6-30) 


is  the  F  distribution  with  (nj  -  1)  and  (n2  -  1)  degrees  of  freedom. 

For  testing  whether  adding  Q  more  predictors  to  the  set  of  N  pre¬ 
dictors  already  in  use  is  going  to  help  in  increasing  the  distance  between  the 
two  groupsi  Rao  *  suggests  the  ratio: 


R  = 


1  + 


nT  n 


1  2 


0*1  +  n2)  (nj  +  n2  -  2) 


n2 

°N+Q 


1  + 


ni  n2 


0*1  +  n2)  (nj  +  n2  -  2] 


% 


(6-31) 


1.  C.  R.  Rao,  Advanced  Statistical  Methods  in  Biometric  Research,  John 
Wiley  and  Sons,  New  York,  1952. 
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2 


where  dn  +q  is  the  distance  based  on  TT  +  Q  characteristics  so  that  in  com¬ 
puting  Wy  from  Equation  6-27,  one  has  i,  j  =1,2...,  N  4-Q  When  Dn2+q 

is  of  the  same  order  as  ,  the  ratio  R  'will  be  approximately  1,  while  a 
high  ratio  would  indicate  that  the  Q  additional  characteristics  are  contributing 
to  the  separation  between  groups.  The  test  statistic  in  this  case  is 


(nj  +  n2  -  N  -  Q  -  1) 
Q 


<R-  1) 


(6-32) 


which  is  distributed  according  to  the  F  distribution  with  Q  and  (nj  +  n2  -  N  -  Q  -  1) 
degrees  of  freedom. 


e. 

Let  A,  be  the  "true"  Mahal&nobis  distance  between  the  two  population 
distributions  of  the  first  N  variables,  i.e. ,  is  the  distance  one  would  com¬ 
pute  if  one  actually  knew  the  populatkm  mean  vectors  =  (mj^ ,  ....  )) 

and  M'2)  =  (m^'  ,  m2'2',  ....  mN  '  )  and  the  covariance  matric  V,  assumed 
equal  for  the  two  distributions .  The  reason  that  the  difference  (D^2  +  Q2  -  J)^2' 
cannot  be  tested  directly  is  that  the  distribution  of  (Dj^ q  -  Dj^ )  Involves 
AN  and  since  Aj^  is  not  known,  an  exact  test  of  significance  cannot  be  made. 


By  Setting  DQ  =  0  and  proceeding  in  a  step-by-step  manner,  the  above 
test  may  also  be  used  to  screen^a  set  of  N  predictors  to  find  the  Q  <N  best  pre¬ 
dictors.  The  first  predictor  Xi  is  selected  by  finding  out  for  which  one  of  the 
N  predictors  Xp  X2,  £..  ,  xN,  the  ratio,  R,  of  Equation  6-31  is  largest,  when 
in  its  denominator  Dj^  =  DQ2  and  in  its  numerator  D  2  _  =  Dj2  .  This  largest 

R  is  substituted  in  the  statistic  of  Equation  6-32  with  N  =  0  and  Q  =  1.  If 
according  to  this  test  X]*  is  significant  at  the  prescribed  level  of  significance, 
then  proceed  to  determine  the  second  best  predictor  X2*  of  the  remaining  (N  -1) 
predictors  by  finding  out  for  which  one 


1  + 


nl  n2 _ 

(nl  +  n2)  (n^  +  n2  -  2) 


R 


1  + 


nln2 _ 

(ni  +  n2)  (nL  +  n2  -  2) 


(6-34) 


is  largest.  Again  the  selection  of  X2  is  conditional  upon  its  being  significant. 
In  this  manner  selection  is  continued  until  Xq  fails  to  be  statistically 
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2 

significant.  Rac's  generalization  of  D  to  the  case  of  more  than  two  groups, 
is  described  in  Appendix  H.  Its  use  in  testing  the  significance  of  predictors 
and  in  screening  variables  is  described  by  Rao*  and  Miller.2  The  relation 
between  D2  and  the  symmetric  divergence,  a  general  measure  of  the  divergence 
between  two  populations,  and  an  approximation  to  the  symmetric  divergence 
for  the  case  of  binary  variables  is  also  described  in  Appendix  H. 


Given  n j  samples  from  group  1  and  n£  samples  from  group  2,  each 
group  can  be  represented  by  points  in  an  N-dimensional  Euclidean  space. 

Now  the  question  is  whether  the  two  sets  of  samples  can  be  projected  onto  a 
line  in  such  a  way  that  in  some  sense  the  separation  between  projected -points 
belonging  to  different  groups  is  maximized  relative  to  the  separation  between 
projected  points  belonging  to  the  same  group. 


Let  the  direction  cosines  of  the  line  to  be  determined  be  proportional 
to  (aj,  -  •  •  •  >  a^j),  and  let 


N 

=  Z  aiXi 

i=l 


(6-35) 


so  that  z 


1  and  z 


(2) 


“2 

(2) 


7  •  •  •  t 


nl 

(2) 


U) 


n. 


represent  the  projected  samples  of  group 
represent  the  projected  samples  of  group  2. 


_(!)  _  (2)  ,  _ 

Further  let  z  and  z  represent  the  mean  values  of  the  two  samples  of  z's  and  let  z 

be  the  mean  of  the  grand  sample  obtained  by  pooling  the  two  sets  of  samples. 

3 

Since  the  first  paper  on  the  subject  by  Fisher,  *  the  direction  which 
has  usually  been  considered  is  that  for  which  the  ratio 


1.  C.  R.  Rao,  Advanced  Statistical  Methods  in  Biometric  Research,  John 
Wiley  and  Sons,  New  York,  1952. 

2.  R.  G.  Miller,  "Statistical  Prediction  by  Discriminant  Analysis,  " 
Meteorological  Monographs,  Vol.  4,  No.  25,  October  1962,  Amer. 
Meteorological  Society,  Boston,  Mass. 

3.  R.  A.  Fisher,  "The  Use  of  Multiple  Measurements  in  Taxonomic  Problems," 
Annals  of  Eugenics  7,  p.  179-88,  Sept.  1936,  appears  in  R.  A.  Fisher 
Contributions  to  Math.  Statistics,  John  Wiley,  N.  Y.,1952. 
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(6-36) 


[zW  -  z<2>f 
_  2 

E(z  -  z) 

is  maximized.  The  resulting  linear  function  z  is  said  to  have  greatest  "vari¬ 
ance  between  samples  relative  to  variance  within  samples. "  The  direction 
which  achieves  this  is  given  in  terms  of  the  vector  A  =  (aj,  &2.>  •  •  •  »  ajj)  by 

A  =  const.  S  *  (x^  -  )  ,  (6-37) 

so  that 

a.  =  slid1  +  s2ld2  +  .  .  .  iNldN  (6-38) 

where  sJ*  are  elements  of  S“*  which  is  the  inverse  of  the  matrix  S  whose 
elements  8jj  are  given  by  Equation  6-18  and  d^  =  (x'  -  ' ).  The  constant 

multiplier  in  Equation  6-37  indicates  that  any  vector  parallel  to  A  determined 
from  Equation  6-38  will  do  just  as  well.  If  the  coefficient  values  of  Equation 
6-38  are  used  in  Equation  6-35  to  obtain  a  classification  procedure  for  a  new 

N 

sample  with  measurements  (*j  .  .  .  *N),  by  comparing  z  =  2  a.**  with  a 

i=l 

threshold  c,  and  classifying  the  sample  into  group  1  if  z  >  c  and  group  2  if 
z  <  c.  Then  one  will  be  using  Fisher's  Linear  Discriminant  Function.  *  It  is 
clear  that  this  linear  function  is  just  the  first  term  of  the  expression  in 
Equation  6-13  for  log  L(x)„  Substituting  the  value  of  a^  obtained  in  Equation 
6-38  into  Equation  6-36  shows  that  the  maximum  value  of  the  ratio  of  Equation 
6-36  is  just  Dj^  which  was  defined  in  Equation  6-26.  It  is  also  easily  shown 
that  the  direction  obtained  in  Equation  6-37  also  results  when,  for  the  projected 
samples  one  maximizes  the  between- samples  scatter  while  keeping  within- 
samples  scatter  fixed  or  minimize  the  ratio  of  within- samples  scatter  to  total 
scatter.  ^  The  relation  of  these  various  maximization  and  minimization  problems 
to  the  ratio  of 


1.  R.  A.  Fisher,  "The  Use  of  Multiple  Measurements  in  Taxonomic  Problems,  " 
Annals  of  Eugenics  7,  p.  179-88,  Sept.  1936,  appears  in  R.  A.  Fisher 
Contributions  to  Math.  Statistics,  John  Wiley,  N.  Y. ,  1952. 

2.  S.  S.  Wilks,  "Multidimensional  Statistical  Scatter,  "  Contributions  to 
Probability  and  Statistics  in  Honor  of  Harold  Hottelling.  Stanford  U.  Press, 
I960. 
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Equation  6-36  is  shown  in  Appendix  H,  where  it  is  also  shown  that  classification 
based  on  ordinary  Euclidean  distance  between  the  projection  of  a  new  sample 
and  the  projected  samples  from  the  two  groups  is  equivalent  to  using  a  linear 
discriminant  function. 

For  the  case  of  G  groups,  G  >  2,the  above  method  would  require 
G(G-l)/2  directions  to  be  computed,  one  for  each  pair  of  groups.  However, 
corresponding  to  the  study  of  group  difference  by  the  projection  of  two  sets  of 
samples  onto  a  line,  is  the  technique  of  multiple  discriminant  functions  for 
the  overall  study  of  group  difference  among  more  than  two  groups.  The  method 
consists  of  projecting  the  G  sets  of  N-dimensional  sample,  G  <  N,  onto  a 
Euclidean  space  of  (G-l)  dimensions  in  such  a  way  that  the  total  scatter  of  the 
G  sets  of  projected  samples  in  this  space  of  (G-l)  dimensions  is  maximised 
relative  to  the  within- samples  scatter  of  the  samples.  The  mechanics  of 
obtaining  the  coordinates  of  this  "Discriminant  Space"  of  (G-l)  dimensions  is 
given  in  Appendix  H.  Corresponding  to  each  coordinate  of  the  (G-l)  dimensional 
space,  one  obtains  a  linear  discriminant  function.  The  resulting  discriminant 
functions  z j  ,  j  =  1 ,  2,  ...  (G-l),  which  are  linear  functions  of  the  original 
predictors  xi  ,  >  .  . . ,  xja ,  may  be  viewed  as  a  reduced  set  of  variables. 

Classification  in  this  space  of  reduced  variables  can  proceed  by  the  methods 
presented  under  the  heading  of  probabilistic  classification. 

6.6 _ Practical  Discriminant  Analysis 

For  the  case  of  two  groups,  the  linear  and  quadratic  classification 
functions  which  have  been  discussed  represent  feasible  solutions  which  can  be 
justified  by  one  of  a  number  of  points  of  view.  The  linear  function  given  by 
the  first  term  of  Equation  6-15  is,  for  normal  distributions  with  covariance 
matrices  assumed  equal,  near -optimum  because  of  its  relation  to  Equation  6-3. 
This  function  can  also  be  justified  as  being  the  one  which  for  arbitrary  dis¬ 
tributions  leads  to  the  maximization  of  the  ratio  of  Equation  6-36.  The  linear 
functionor  hyperplane  represents  the  simplest  boundary  which  can  be  used 
to  divide  the  sample  space.  Quadratic  classification  functions  represent  the 
next  level  of  complexity.  The  substitution  of  sample  estimates  of  parameters 
in  Equations  6-8  and  6-9  and  their  subsequent  use  in  the  quadratic  function 
of  Equation  6-7  leads  to  a  near -optimum  solution  when  the  distributions  are 
multivariate  normal  with  unequal  covariance  matrices  and  the  sample  sizes 
are  large.  For  other  continuous  distributions  this  quadratic  function  of 
Equation  6-7  may  be  considered  as  a  "good"  first  approximation.  For  the 
case  where  the  covariance  matrices  are  not  assumed  equal  an  alternative  is 
to  compute,  the  Anderson-Bahadur  linear  function  which  finds  the  direction 
along  which,  for  arbitrary  distributions  the  ratio  of  the  difference  between 
means  of  projected  samples  to  the  sum  of  the  standard  deviations  of  the  two 
sets  of  projected  samples  is  maximized  (see  Appendix  H). 
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For  binary  variables,  a  second  order  approximation  to  the  likelihood 
ratio  is  obtained  from  Equation  6-12  and  for  the  case  where  the  variables 
appear  to  be  only  slightly  correlated,  a  further  approximation  leads  to  the 
quadratic  function  of  Equation  6-13. 


When  the  number  of  predictors  is  large  an  initial  screening  of  the 
variables  may  be  done  by  using  the  generalized  distance  D^.  This  results 
in  a  reduction  of  computation  since  a  number  of  terms  involving  unselected 
variables  need  not  be  computed.  The  reduction  in  computation  is  especially 
valuable  when  the  number  of  groups  involved  is  large. 


For  more  than  two  groups,  G,  a  transformation  of  the  sets  of  samples 
from  the  original  N-dimensional  space  to  a  discriminant  space  of  (G-l) 
dimensions  whose  coordinates  are  linear  functions  of  the  original  N  variables 
enables  one  to  apply  classification  procedures,  both  parametric  and  non¬ 
par  ametric,  in  the  new  space.  Since  even  a  linear  function  of  binary  variables 
will  be  a  continuous  variable,  the  joint  distributions  of  a  new  reduced  set  of 
variables  will  be  better  approximated  by  a  multivariate  normal  distribution 
than  the  joint  distributions  of  the  original  variables. 


Having  obtained  a  classification  procedure  it  is  useful  to  evaluate  its 
performance  on  new  samples  in  addition  to  the  known  samples  which  were  used 
to  design  the  classification  procedures.  Two  types  of  statistics  are  of  interest. 
The  first  should  test  whether  the  procedure  being  used  does  better  than  a  suit¬ 
ably  defined  "pure  chance"  method.  The  second  statistic  should  test  whether 
one  classification  function  performs  significantly  better  than  another  classifica¬ 
tion  function.  It  is  necessary  to  have  such  tests  since  the  performance  on  a 
small  sample  of  unknowns  may  not  be  indicative  of  true  performance.  Some 
statistics  which  have  been  suggested  for  this  purpose  are  presented  by  Lubin  ^ 
and  Miller,  ^ 


1.  Lubin,  A.  ,  "Linear  and  Non-Linear  Discriminating  Functions,  "  Brit.  Jour. 
of  Psychology  (Stat.  Sec.^Vol.  3,  pp.  90-104,  1950. 

2.  R.  G.  Miller,  "Statistical  Prediction  by ‘Discriminant  Analysis,  " 
Meteorological  Monographs.  Vol.  4,  No.  25,  October  1962,  Amer. 
Meteorological  Society,  Boston,  Mass. 


Table  6-1  presents  the  practical  statistical  techniques  available  for 
generating  linear  and  quadratic  discriminant  functions  suitable  for  classification 
of  unknown  samples  belonging  to  one  or  the  other  of  two  populations.  A  dic- 
cusion  of  the  relationship  of  these  techniques  to  some  of  the  current  work  on 
pattern  recognition  networks  has  been  presented  in  references  1  and  2. 


1.  Kanal,  L. ,  "Evaluation  of  a  Class  of  Pattern  Recognition  Networks,  " 
Biological  Prototypes  and  Synthetic  Systems,  Vol.  1,  pp.  261-269, 

New  York,  Plenum  Press,  1962. 

2.  Kanal,  L. ,  et  al ,  "Basic  Principles  of  Some  Pattern  Recognition  Systems,  " 
Proc.  National  Electronics  Conference,  Vol.  18,  pp.  279-295,  October 
1962. 
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SUMMARY  OF  BASIC  CLASSIFICATION  PROCEDURES 
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SECTION  7 


DESIGN  OF  COMPUTER  SIMULATION  EXPERIMENTS 


7.  1 _ General  Description 

The  overall  purpose  of  the  computer  simulation  experiments  is  the 
evaluation  of  techniques  for  the  recognition  of  targets  of  military  significance 
in  gray-scale  aerial  photographs.  The  input  data  consist  of  100  small  photo¬ 
graph  segments,  each  containing  a  tank  image  and  150  or  more  photograph 
segments  containing  typical  terrain  background  but  no  tanks.  The  experiment 
will  deal  with  the  problem  of  identifying  tanks  only,  but  the  approach  is  suf¬ 
ficiently  general  to  be  applicable  to  a  variety  ox  other  i argot  types. 

The  images  will  be  converted  to  digital  form  and  Laplacian  processed 
in  the  Philco  S-2000  general  purpose  computer  into  black  and  white  represen¬ 
tations  showing  only  the  relatively  sharp  contrast  edges.  Using  a  training 
sample  consisting  of  50  tank  images  and  50  non-tank  images,  the  computer 
will  then  design  a  number  of  feature  masks,  one  for  each  of  several  sub-areas 
of  the  image  space.  These  feature  masks  will  be  sums  of  weighted  binary 
picture  element  values,  and  in  some  cases,  weighted  pairs  of  picture  element 
-values.  In  the  mathematical  sense,  the  masks  are  linear  and  quadratic  dis¬ 
criminant  functions.  The  weights  are  of  the  coefficients  of  these  discriminant 
functions  calculated  from  the  resolution  element  values  obtained  in  the  sample 
(50  tanks  and  50  non-tanks),  based  on  the  principles  of  statistical  decision 
theory.  The  masks  will  be  designed  to  produce  maximum  outputs  for  the  tank 
images  and  minimum  outputs  for  non -tank  images.  There  are  several  alter¬ 
native  procedures  for  calculating  coefficients;  the  optimum  procedure  for  the 
present  problem  has  not  been  determined.  Consequently,  each  alternative 
procedure  must  be  used  to  design  a  set  of  masks  and  the  performance  of  the'' 
different  sets  must  be  compared  to  determine  which  is  best. 

After  calculating  the  coefficients,  the  computer  will  calculate  and 
print  out  the  mask  outputs  for  each  of  the  50  tank  and  50  non-tank  images  in  the 
training  sample.  There  will  be  a  print-out  for  each  of  the  several  sets  of 
masks  designed  according  to  the  alternative  statistical  procedures.  These 
print-outs  of  the  feature  mask  outputs  will  be  studied  to  determine  error  and 
false  alarm  rates  for  various  assumed  threshold  settings,  in  order  to  obtain 
an  idea  of  the  relative  discrimination  capabilities  of  the  alternative  mask  sets. 

Assuming  a  set  of  thresholds  giving  equal  false  alarm  and  false  dis¬ 
missal  error  rates,  a  calculation  will  be  made  of  the  binary  values  of  the 
threshold  feature  mask  outputs  for  each  member  of  the  training  sample  and  the 
computer  will  use  this  data  to  calculate  the  coefficients  of  a  final  discriminant 
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function  for  each  tnask  set.  The  computer  will  then  calculate  and  print  out  the 
analog  value  of  this  final  discriminant  function  for  each  image  of  the  training 
sample.  Here  again,  a  print-out  will  be  made  for  the  outputs  of  each  alterna¬ 
tive  set  of  feature  masks.  Inspection  of  this  print-out  will  show  the  error 
rates  and  false  alarm  rates  obtained  for  various  assumed  threshold  settings  on 
the  final  decision  function.  These  figures  will  provide  another  basis  for  com¬ 
parison  of  alternative  mask  designs,  but  not  a  conclusive  one,  inasmuch  as  the 
comparison  will  be  based  only  on  the  performance  of  the  simulated  recognition 
systems  on  the  training  sample,  i.e.,  the  same  set  of  images  previously  used 
to  design  the  feature  masks. 

The  final  design  phase  of  the  experiment  will  provide  the  decision 
logic  for  a  set  of  random  masks.  The  computer  will  first  determine  the  ran¬ 
dom  mask  outputs  for  the  training  sample  by  summing  up  the  products  of  all 
the  binary  resolution  element  values  times  the  binary  values  of  a  random  mask 
at  the  corresponding  points  in  the  image  space  and  repeating  this  operation  for 
all  the  random  masks.  The  computer  will  then  calculate  an  optimum  discrim¬ 
inant  function  and  threshold  (final  decision  logic)  for  the  random  masks. 

In  the  next  phase  of  the  experiment,  an  evaluation  and  comparison 
will  be  made  of  the  performance  of  each  of  the  alternative  system  designs  on 
an  unknown  sample,  i.  e.,  a  fresh  sample  set  of  tank  and  non-tank  images  and 
used  in  designing  the  recognition  logics.  Fifty  new  tank  images  and  at  least 
100  fresh  non -tank  images  will  be  processed  by  a  computer  simulation  of  the 
recognition  process,  and  the  final  discriminant  function  output  before  thresh¬ 
olding  will  be  printed  out  for  each  member  of  the  sample.  This  step  will  be 
repeated  for  each  simulated  recognition  logic.  Inspection  of  these  print-outB 
will  provide  estimates  of  error  rates  and  false  alarm  rates  for  various  as¬ 
sumed  threshold  settings  with  each  simulated  system  design  operating  on  an 
unknown  sample. 

False  alarm  rates  are  expected  to  vary  considerably,  depending  on 
the  degree  of  resemblance  of  the  non -tank  image  patterns  to  tank  images,  A 
number  of  selected  non -tank  samples  from  different  populations  will  be  put 
through  the  computer  simulation  for  one  or  more  siihulated  recognition  sys¬ 
tems  in  order  to  obtain  separate  estimates  of  false  alarm  rates  on  each  of  a 
number  of  typical  backgrounds  (e.g.,  open  country,  rocky  terrain,  built-up' 
areas,  and  hedgerows).  This  probably  will  necessitate  a  repetition  of  the 
computer  simulation  with  additional  samples  of  non -tank  images. 

After  completing  and  evaluating  the  results  of  this  first  series  of  ex¬ 
periments  simulating  systems  using  random  masks  and  optimum  feature  masks, 
it  is  planned  to  conduct  further  experiments  on  a  universal  feature  mask  sys¬ 
tem.  This  system  will  use  a  number  of  geometrical  and  perhaps  some  texture  - 
feature  masks,  and  test  for  these  features  over  all  sub-areas  in  the  image 
space.  These  would  be  "general  purpose"  features  including  straight  lines  and 
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parallel  straight  lines  of  various  lengths  at  each  of  a  number  of  angles,  cor-  • 
ners  at  various  angles  and  circles.  The  computer  will  compile  statistics  on 
the  numbers  of  each  of  these  features  seen  in  samples  of  each  of  a  number  of 
different  target  classes  and  non-target  classes.  The  decision  logic  will  then 
classify  an  unknown  image  according  to  the  number  of  each  of  the  features  it 
contains . 


An  alternative  technique  differing  somewhat  from  the  one  just  de¬ 
scribed  will  also  be  investigated  in  these  follow-on  experiments.  Rather  than 
compiling  statistics  on  the  number  of  features  throughout  the  image  space, 
regardless  of  the  location  of  these  features,  the  image  will  be  arbitrarily 
divided  into  a  number  of  blocks  and  the  existence  or  non-existence  of  each 
feature  will  bs  determined  for  each  block.  This  information  will  then  be  used 
as  input  to  a  recognition  logic  based  not  only  on  the  occurrence  of  individual 
features,  but  on  where  these  features  are  located  in  the  image  space. 

The  constraints  selected  to  simplify  the  problem  have  been  choseh  so 
that  they  do  not  prohibit  a  generalization  of  the  results.  The  generalizations, 
hopefully,  will  permit  the  establishment  of  rules  and  techniques  which  in  turn 
will  lead  to  the  design  of  more  complex  re  cognition  logic.  The  simplifications 
assumed  for  the  present  model  are  summarized  in  Table  7-1. 

The  sample  imagery  for  these  experiments  is  taken  from  two  rolls  of 
aerial  photography  (negative)  taken  during  Army  maneuvers  at  Fort  Drum, 

New  York,  in  August  I960.  The  scale  of  this  photography  is  approximately 
1:6000.  One  hundred  images  of  tanks  were  cropped  from  contact  prints.  These 
print  segments  were  then  assembled  into  a  ten-by-ten  mosaic  with  all  the  tank 
images  oriented  in  the  same  direction.  One  hundred  other  photograph  seg  - 
ments,  containing  miscellaneous  natural  and  man-made  objects,  were  similarly 
assembled  into  a  ten-by-ten  mosaic.  The  mosaics  were  then  photographed  and 
printed  as  positive  transparencies  on  four -by-five  inch  glass  plates  suitable  for 
input  to  the  Philco  1MITAC*  equipment.  The  tank  mosaics  are  shown  in  Figure 
7-1.  It  is  planned  to  prepare  additional  non-tank  mosaics  containing  selected 
patterns  representative  of  various  typical  backgrounds  (e.g. ,  open  country, 
built-up  areas). 


*  "IMage  ftiput  To  Automatic  Computer”  -  -  a  flying  spot  scanner  and  associ¬ 
ated  circuitry  that  converts  an  image  into  digital  form  to  six-bit  precision 
(on  tape)  preparatory  to  processing  the  video  data  in  the  Philco  S-2000 
general  purpose  computer. 
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TABLE  7-1 


SIMPLIFYING  ASSUMPTIONS  FOR  SIMULATION  MODEL 


Simplifying  Assumption 

Comment 

Gray -level  information  is  discarded 
in  the  pre-processing  operation. 

Although  information  is  discarded  in 
the  process,  this  technique  does  not 
compromise  proof -of -feasibility  with 
gray -scale  images. 

Tank  mosaics  are  presented  in 
registry. 

No  real  performance  compromise 
since  registry  testing  is  expected  to 
be  used  to  eliminate  registry  prob  - 
lems  in  any  future  machine  or  alter¬ 
nately,  redundancy  is  introduced  to 
handle  registry. 

Interdependence  of  elements  of 
different  regions  of  the  retinal 
space  is  ignored  in  designing 
masks . 

Final  decision  logic  will  consider 
pairwise  dependent  relationships  and 
thus  compensate  in  part  for  the 
simplification . 

Initial  size  of  region  for  which 
mask  is  designed  is  taken  as  about 

50  retinal  elements. 

Some  overlapping  of  regions  will  be 
performed  to  gain  insight  into  the 
effect  of  this  limitation-.- 

Restricted  population  of  images. 

Careful  choice  of  images  to  avoid 
simple  cases  or  unusual  views  should 
give  >a  "typical"  set* 

7-4 


7-5 


7.2 


Discriminant  Function  9  Utilized  in  the  Simulation 


Introduction 


ihe  first  requirement  is  to  generate  Intermediate  decision  statistics 
of  the  form 


M 


r  <S>  =  V 


M, 


bi  xi  +  )  a-  xi  xi 


i  =  1 


(r  =  1, . . . ,  6} 

(s  =  1,  2,  3  )  '  1 


i»j  =  1 


where. 


x-  s  the  response  from  the  i*b  retinal  element 

=  the  number  of  retinal  elements  in  the  rtb  region  with  significant 
linear  weights,  b. 

M2  =  the  number  of  pairs  of  retinal  elements  with  significant  weights, 

a... 

iJ 

r  s  the  index  indicating  the  rtb  subset  of  retinal  elements 


s  3  an  index  signifying  the  statistical  method  applied  in  determining 
the  weighting  coefficients 

Y  ^  =  the  response  from  the  rtb  feature  mask  derived  by  an  application 
of  the  s*h  discriminant  function. 


These  decision  statistics  (feature  masks)  will  be  designed  by  deter¬ 
mining  weighting  coefficients  for  each  element  and  each  pair  of  the  50  retinal 
elements  in  the  rtb  subset. 

Linear  Discriminant  Function  per  R.  A.  Fisher* 

The  linear  discriminant  function  of  R.  A.  Fisher  is  one  of  the  basic 
techniques  that  will  be  used.  Since  this  is  a  linear  function,  Equation  (7-1) 
reduces  to 


1.  R.  A.  Fisher,  "The  Use  of  Multiple  Measurements  in  Taxonomic  Problems,’ 
Annals  of  Eugenics  7.  p.  179-88,  Sept.  1936,  appears  in  R.  A.  Fisher 
..Contributions  to  Math.  Statistics.  John  Wiley,  N.  Y.,  1952. 
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(7-2) 


i  =  1 


In  this  case,  Mj  of  the  linear  weights  {where  Ivlj  <  50),  will  be  used  to  imple¬ 
ment  the  feature  mask  for  the  particular  region  of  the  retinal  space  involved. 
Obviously,  the  most  significant  weights  will  be  selected.  It  is  expected 
that  the  Mahalanobis  test  of  significance  described  in  Section  6  will  be 
satisfactory  for  selecting  the  weights. 

Quadratic  Discriminant  Function  with  Assumed  Normality* 

Two  quadratic  discriminant  functions  are  used.  They  differ  in  the 
following  general  way.  The  first  technique  (title  above)  assumes  that  the  ret¬ 
inal  responses,  which  are  binary  variables,  satisfy  a  multivariate  normal  dis¬ 
tribution.  Thus,  the  logarithm  of  the  likelihood  ratio  can  be  established  and 
weighting  coefficients  are  derived  by  grouping  linear  and  quadratic  terms. 
Naturally,  the  covariance  matrix  of  the  two  populations  are  assumed  to  be  un¬ 
equal-  Keeping  significant  coefficients,  a  quadratic  discriminant  function  of 
the  form  of  Equation  7-1  is  established. 

Quadratic  Discriminant  Function  per  Lazarsfeld  and  Bahadur^ 

This  second  quadratic  function  makes  particular  use  of  the  fact  that 
the  underlying  variables  are  binary.  The  actual  density  function  of  the  retinal 
responses  is  approximated  by  the  Lazarsfeld -Bahadur  expansion  of  density 
functions  of  binary  variables.  Dependence  higher  than  pairwise  is  neglected. 
This  approximation  also  results  in  a  quadratic  form  like  Equation  7-1,  under 
conditions  specified  in  Appendix  H. 

7.  3 _ Details  of  the  Computation 

Six  decision  statistics  of  the  form  indicated  in  Equation  7-1  will  be 
generated  for  each  of  the  three  discriminant  functions  listed  above.  The  com¬ 
puter  will  calculate  sample  values  of  the  Y  ^s)'s  and  will  print  out  analog 
values  to  be  used  for  evaluation  and  comparison.  This  print-out  will  be  uti¬ 
lized  to  estimate  threshold  effects,  to  compare  the  performance  of  one  set  of 
masks  with  another,  and  to  judge  the  effectiveness  of  the  masks  for  purposes 
of  classification. 


1.  C.R.  Rao.  Advanced  Statistical  Methods  in  Biometric  Research.  John  Wiley 
and  Sons,  New  York,  1952. 

2.  Bahadur,  R.R. ,  "On  Classification  Based  on  Responses  to  n  Dichotomous 
Ttfimg. 11  TTSAjr  SAM  series  in  Statistics,  Randolph  AFB,  Texas,  1959. 
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Intermediate  decisions  will  be  made  about  the  presence  or  absence  of 
a  feature  yielding  binary  variables  as  inputs  to  the  final  decision  logic.  That 
is,  the  final  decision  statistic  for  each  of  the  discriminant  techniques  will  take 
the  form 

6 

I 

r  =  1 

where 


Y 


The  coefficients  cr  and  drq  will  be  calculated  according  to  the  method  found 
most  effective  in  specifying  the  coefficients  a—  and  bj  (Equation  7-1  above). 
The  computer  will  calculate  and  print  out  the  coefficients  cr  and  arq  and 
analog  values  of  Zg  (s  =  1,2,  3)  for  sets  of  Yr^s)  derived  from  each  discriminant 
technique.  These  print-outs  will  be  used  in  evaluating  and  comparing  these 
alternative  techniques.  Final  decisions  based  on  whether  or  not  Zs  >  9  will 
be  made  on  test  samples  containing  50  line  drawings*  of  tank  mosaics  and  two 
or  more  groups  of  50  line  drawings  of  non -tank  populations. 

The  results  of  the  above  experiment  will  be  used  as  a  guide  for  the 
design  of  feature  masks  over  the  entire  retinal  space.  It  is  expected  that  some 
overlapping  sub -regions  of  the  retinal  space  will  be  permitted  in  the  design  of 
feature  masks  at  this  stage  of  the  experiment  to  provide  some  insight  into  the 
effects  of  the  limited  size  of  the  sub -regions.  Some  16  to  32  feature  masks 
will  be  designed  in  conjunction  with  a  "best"  final  decision  logic.  These  will 
be  used  in  the  next  experiment  for  comparison  to  the  random  mask  approach. 

The  majority  of  the  calculations  required  of  the  computer  will  involve 
solving  the  equations  listed  in  the  paragraphs  below. 

Detailed  Calculations  for  Linear  Discriminant  Function 


21 


cr  Yr<8>  <„)  +  drq  Yr(S)  <-»»>  YqS)  M  <7-3) 


r,  q  =  1 


l:Yr(s)  >  rj 

r(S)(n)=  {  (s) 

\  0:  Yr'8'  <  r) 


(7-4) 


Solve  for  b£,  where 

/  50  \ 

biXi  =  (  Y.  *51  <xj  (1)  -*j  (2))J  xi  (7-5) 

i.  e.,  detail  detected  representations  using  the  thresholded  Laplacian. 
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or 


bi=(SHd1  +S2id2  +  .  .  .  S50id50) 


(7-6) 


where 


di  =  (3Ei(1)  -  iq(2)) 


(7-7) 


and  sJ1  =  an  element  of  the  inverse  transform  of  S  defined  in  Equation  6-18, 
Section  6. 

Repeat  for  six  different  subsets  of  {x}  of  50  elements  each,  for  all  such 
that  j  b^  j  >  j"  ■  (k  1),  compute  the  statistics 


where 


Jg 


=  Y  *>ixi  (j  =  1,  .  .  .  ,  Sg) 

*- — »  /  „  __  1  TV 


i=l 


(g  =  1.2) 


(7-8) 


=  -§7  I  t,. 

j=l 


(7-9) 


_  1 


S2 

V 

L 

j=l 


Yj2 


(7-10) 


Sj  =  number  of  target  samples  of  class  1 
S2  =  number  of  target  samples  of  class  2 


The  estimated  computer  time  on  the  Philco  2000  required  to  complete 
computations  ofallYj  s,  Y  ^'s,and  Y2's  is  less  than  30  seconds  without  print-out. 


7-9 


Print-out  of  all  300  bj.’s  and  all  Yj's,  Yj’s,  and  Y2*s,  will  be  required  to  pro 
vide  analog  data  for  engineering  analysis  and  evaluation.  An  additional  15 
seconds  of  computer  time  is  approximated  for  print-out. 

Calculation  for  Likelihood  Ratio  with.  Assumed  Normality 


Compute 


aij  =  "(V11J  "  v2^  (i’  J  =  l»  •••  »  5(» 


(7-11) 


and 


.  ,  j,  (1)  il  (2)  (1)  i2  (2)  i 2.a 

bj  =  2  <  (m1  vx  -  mx  v2  )  +  (m2  vj  -  m2  v2  )  + 


.  .  (1)  iN  (2)  IN., 

+  (mN  vx  -  mN'  v2  ) 


(7-12) 


Keeping  Significant  values  of  bj  and  ay,  compute 
M1 


Ml  M2 


i=  1 


i.  j=l 


(e  =  1.  . 

(k  =  1,  . 


.  ,  100) 
.  .  6) 


where 


=  number  of  significant  values  of  bj^s 
M2  =  number  of  significant  pairs,  i.  e, ,  of  ay’s 
s  =  index  over  target  sample 

k  *  index  over  features  for  which  masks  are  required. 


(7-13) 


With  print-out  of  the  sample  means,  sample  covariances,  and  the  co¬ 
efficients,  the  automatic  design  of  six  masks  will  required  less  than  one  minute 
of  computer  time. 

Calculation  for  Second  Order  Binary  Discriminant  Function 
(Lazarsfeld-Bahadur  Expansion) 


Compute 


(7-14) 
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where 


u.. 


r. . 
ij 


(g) 


(g)  _  _ _ _ _ _ 

1J  V  m.(«)  (1  -m^)  mj<8>  (1  -mW) 


(g  =  1,  2)  (7-15) 


and 


with 


b.  =  a.  + 

l  i 


H  =  lo«  [ 


N 


3  * 


u..(1)  +  m.W  u..(2)  )  (7-16) 


m.^  (1  -  nx 


L  (1  -  ns 


i<2)>  1 

i(i)>  . 


(7-17) 


(Note:  see  Section  6  for  approximations  made  in  determining  these  coefficients. ) 

Keeping  significant  weighting  coefficients,  compute  six  feature  mask 

M2 

biXi+  ^  aijxixj  (7'18) 

i,j=l 

(s  =  1,  .  .  .  ,  100) 

(k  =  1,  .  .  .  .  6) 
as  in  the  likelihood  ratio  technique  described  above. 

With  print-out  of  the  sample  means,  sample  covariances,  and  the  co¬ 
efficients,  the  automatic  design  of  six  masks  will  require  less  than  one  minute 
of  computer  time. 

Final  Decision  Logic  and  Comparison  Experiment 

A  choice  of  the  discriminant  function  to  be  used  in  the  final  decision 
will  be  made  on  the  basis  of  experimental  results.  Each  set  of  six  masks 


responses: 


i  s 
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derived  by  the  three  methods  will  be  used  to  generate  inputs  to  the  final  deci¬ 
sion.  Scores  or  frequencies  of  the  two  ways  of  mis  classification  will  be  accu¬ 
mulated  on  the  basis  of  six  masks  as  a  function  of  the  threshold  of  the  final 
decision.  With  print-out  of  the  final  decision  statistics  for  each  target  sample, 
compute!  time  for  this  phase  is  expected  to  be  from  two  to  five  minutes. 

These  results  will  be  used  as  a  guide  for  final  design  of  the  simulated 
tank  recognition  device.  From  16  to  32  masks  will  be  designed  by  the  pre¬ 
ferred  discriminant  technique,  i.  e.,  an  additional  10  to  26  masks.  Depending 
upon  the  results  of  previous  phases,  the  required  computer  time  may  be  as 
much  as  one  hour  or  as  little  as  a  few  minutes. 
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SECTION  9 


PLANS  FOR  THE  NEXT  INTERVAL 


Implementation  Studies 

The  problems  inherent  in  parallel  access  optical  correlators  will  be 
examined  in  greater  detail.  In  particular,  we  will  study  the  feasibility  of 
layering  subdecisions  using  cascaded  multimask  arrays  or  mosaics  of  optical 
correlators  to  achieve  data  rates  competitive  with  multiple  cross-correlation 
techniques  based  on  scanning. 


Compiler  Simulation  Experiments 


Work  will  continue  on  the  experiments  described  in  Section  7.  Computer 
programs  will  be  prepared  for  feature  mask  design,  for  generating  random 
mask  outputs,  and  for  computing  mask  and  final  decision  logic  outputs.  These 
computer  programs  will  operate  on  the  training  sample  (fifty  tanks  and  fifty 
non-tanks)  to  generate  print-outs  from  which  alternative  feature  masks  designs 
will  be  evaluated. 

Computer  simulation  of  performance  of  feature  masks  and  random  masks 
on  unknown  samples  will  be  completed  if  time  permits. 
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IDENTIFICATION  OF  KEY  TECHNICAL  PERSONNEL 


.  Name 

Title 

Man  - 

*Xt  X  HUTS' 

yj:  v««  — 

11/30/62 

Hours 

2/28/63 

*T.  J.  B.  Shanley 

Manager,  Recognition 
Laboratory 

indirect  labor 

T.  J.  Harley,  Jr. 

Research  Group  Supervisor 
(Project  Manager) 

257 

411 

J.  S.  Bryan 

Research  Section  Manager 
(Chief  Project  Scientist) 

148 

238 

J.  B.  Chatten 

Manager 

45 

19 

*L.  Kanal 

Manager 

93 

115 

*W.  F.  Werner 

Research  Specialist 

12 

0 

*'D.  R.  Taylor 

Research  Specialist 

0 

212 

*J.  Z.  Grayum 

Research  Specialist 

0> 

149 

*C.  Gumacos 

Research  Specialist 

0 

213 

*H.  G.  Kellett 

Senior  Engineer 

0 

320 

*  J.  R.  Richards 

Senior  Engineer 

48 

458 

H.  H.  Schaffer 

Senior  Engineer 

426 

40 

*J.  Mantell 

Physicist 

0 

265 

*H.  Domabyl 

Engineer 

0 

248 

* 


Biographic  a  of  these  personnel  are  included  in  Appendix  I. 
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APPENDIX  A 


LAPLACIAN  AND  GRADIENT  PREPROCESSING 


1 .  Introduction 


Two  candidate  preprocessing  techniques  are  presently  under  detailed 
consideration  for  modifying  input  video  data  prior  to  its  introduction  into  an 
imagery  screening  cross-correlator.  They  are:  Laplacian  (spatial  band-pass 
filtering),  and  gradient  magnitude  (spatial  differentiation) .  The  Laplacian 
operation  eliminates  absolute  brightness  scale  as  well  as  low- spatial  frequencies 
which  are  of  little  consequence  in  screening  operations.  Gradient  magnitude 
detection  is  a  non-linear  operation  that  responds  to  edges  in  the  scanned 
photograph.  Each  technique  has  distinct  advantages,  and  either  may  be  im¬ 
plemented  simply  with  short  electrical  delay  lines . 

The  chief  advantage  of  the  Laplacian  is  that  it  responds  linearly  to  a  large 
fraction  of  relevant  input  data,  while  the  gradient  rejects  more  data  and 
presents  the  retained  data  in  a  form  more  readily  usable  in  a  later  cross- 
correlation  process. 

Delay  line  implementations  of  the  two  preprocessing  techniques  are 
similar  in  concept.  A  ribbon  scan  is  assumed  (see  Section  4.  3);  Figure  A-l 
shows  part  of  the  path  followed  by  the  scanning  spot  image  prior  to  its  arrival 
at  point  A  on  the  transparency.  Through  insertion  of  proper  delays,  simultaneous 
sensing  at  any  number  of  points  A  through  F  is  possible.  The  Laplacian  approxima 
tion  L  may  be  taken  as 

L  =  VD  -  1/4  (VA  +  Vc  +  VE  +  VF),  (A-l) 

and  the  gradient  magnitude  [  G  |  as 

M  ="\/(VA  -  VE)2  +  (VB  -  VD)  2  .  (A- 2) 

Alternately,  the  gradient  squared  may  be  obtained  and 

|g|  2  =  (VA  -  VE)2  +  (VB  -  VD)2  .  (A- 3) 

The  quantities  VA,  VB  ....  are  voltages  sensed  at  the  points.  A,  B,  .  .  .  , 
in  Figure  A- 1 . 


A-l 


In  the  following  discussion  the  two  delay  line  preprocessing  techniques  are 
examined  to  determine  their  signal-to-noise  ratio  capabilities. 

2.  Laplacian 


Figure  A-  1  illustrates  simultaneous  delay  line  sensing  of  several  data 
points  in  an  array  of  fixed  form.  The  Laplacian  (L)  at  sample  point  D  may  be 
obtained  approximately  by  taking 

Ld  =  VD  '  I/4  <VA  +  VC  +  VE  +  VF  ).  (A-4) 

The  voltage  (V)  sensed  at  each  point  is  simply  input  video  delayed  by  some  fixed 
interval,  plus  noise  which  is  assumed  to  be  gaussian  and  independent  of  that  at 
the  other  terminals.  The  output  noise  component  (Nq)  due  to  the  Laplacian 
operation  above  is  simply  the  weighted  rms  summation  of  noise  components 
(rms  value  Oj),  i.  e. , 

Nd  =“\Ai2  +  <4*i2>  =  I-*2*!  (A- 5) 

Because  the  Laplacian  signal  output  differs  in  form  from  that  at  the  input, 
it  is  necessary  to  define  criteria  by  which  signal-to-noise  ratio  comparisons 
may  be  made.  For  example,  we  may  take  as  the  input  signal,  video  due  to 
scanning  vertically  across  a  horizontal  edge  and  let  (S/N)^  represent  the  ratio 
of  the  resulting  video  step  to  rms  noise  in  the  video  signal.  Since  the  Laplacian 
and  gradient  squared  functions  will  generally  be  used  in  conjunction  with  thres¬ 
hold  detectors,  it  is  reasonable  to  define  (S/N)out  as  the  ratio  of  the  peak 
deviation  of  the  signal  from  zero  to  the  rms  noise  in  the  output.  When  a 
horizontal  transition  lies  between  points  C  and  D,  the  instantaneous  output 
signal  amplitude  is  one  quarter  of  the  input  step.  When  the  transition  lies 
between  D  and  E,  the  output  is  one  quarter  of  the  input  step,  but  in  the  opposite 
polarity.  The  relationship  between  output  and  input  signal-to-noise  ratio  j 
now  be  written  for  this  case  as 

<S/N)out  =  41  t!  S  =  °-223  <S/N>in-  (A-6) 

I  •  jl .Ct  CTj 

Similarly,  it  may  be  shown  for  a  line  of  width  approximately  equal  to  element 
spacing  that 

<S/N)out  =  0.446  (S/N)  in  (A-7) 


A-3 


and  for  a  point  of  about  this  diameter 


(S/N)out  =  0.892  (S/N) 


in 


(A -8) 


While  these  figures  are  obtained  for  specific  inputs,  it  is  believed  that 
they  are  indicative  of  performance  to  be  expected  from  the  delay  line  Laplician 
detector. 

3.  Gradient  Squared 

The  second  preprocessing  technique  to  be  considered  is  the  gradient 
(magnitude)  squared.  By  this  process,  level  changes  are  detected  without 
regard  to  the  direction  of  the  change.  The  delay  line  gradient  squared  detector 
operates  on  the  points,  A,  B,  D  and  E  in  Figure  A-l,  and  the  equation  of  this 
operation  is,  (ignoring  noise) 


G|  =  <vA  -  VE)  +  (VB  -  VD)2  . 


(A- 9) 


Since  there  is  a  signal  component  and  a  noise  component  associated  with  each 
of  the  four  points,  it  is  convenient  to  rewrite  Equation  A-9  with  noise  terms 
included.  The  output  voltage  (VQ )  is  thus  found  to  be 


(A- 10) 


<VA-VE,*<»A-»E>]  *[<VB-VD>+«nB-»D,] 

VG  =  J ( VA  -  VEf  +  ( VB  -  VD)  2|  +  2  J  ( VA  -  VE )  (nA  -  nE )  +  (VB  -  VD)  (nB  -  nD )] 
+  j^nA  -  "E)2  +  (nB  -  nD)2j 


=  Vi  (S)  +  V 2  (S,  N)  +  V3  (N) 
2 


(A- 11) 

(A- 12) 


with  Vi  (S)  =  G 


A-4 


V2(S,N)  is  a  linear  signal -weighted  noise  term  whose  rms  value  <ry2  *s 

<xV2  =  2  [(VA  -  VE)2(2  0i2)+[VB  -  VD)2(2tri2)l  UZ  (A-13) 

L  J 

=  2V2<r.  j^(VA  -  VE)2  +  (VB  -  Vd)2J^2.-"  (A- 14) 

=  Z-y/Zoi  |Gf  (A- 15) 

where  oi  is  the  standard  deviation  of  noise  at  each  delay  line  terminal. 


V3  (N)  is  a  square  law  summation  of  two  random  noise  terms  resulting  in 
a  chi-square  distribution.  *  Since  each  noise  term  nA,  nE  nB  ,  njj  has  rms 
value  oi ,  we  may  let  X  j  =  n A  -  n  B  and  X2  =  nB  -  n jj  with  crxj  =  <rx2  ”  <rx  ~\f  2  . 

Then  the  chi-square  distribution  gives 

V3 


1  2  ox 

f(V3)  =  - —  e  x  (A- 16) 

The  mean  (^3)  and  standard  deviation  (0^3)  of  this  density  function  are  equal 
and 


2 

^*V3  =  aV3  ~  2ox 


Since  crx 


Hy3  =  <rV3 


(A- 17) 


(A- 18) 


As  with  the  Laplacian,  it  is  necessary  to  set  up  conditions  by  which  output 
signal -to -noise  ratio  may  be  compared  with  that  into  the  gradient  detector.  We 
once  again  consider  a  horizontal  edge  with  (S/N)jn  the  ratio  of  this  step  to  rms 
noise  at  any  tap.  It  is  clear  that  the  output  due  to  this  edge  is  maximum  when 


1.  .Modern  Probability  Theory  and  its  Applications.  Emanuel  Parzen. 
John  Wiley  and  Sons,  N.  Y.  1960,  p.  181. 
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the  edge  passes  through  center  of  the  array,  and  is  twice  the  input  step  magnitude. 
Therefore, 


g|  =Visin 

(A- 19) 

,2  2 

G  =  2Sin 

(A-20) 

and  we  now  have  Che  ingredients  necessary  to  determine  how  gradient  detection 
affects  signal-to-noise  ratio, 

It  is  convenient  to  obtain  expressions  for  very  high  and  very  low  signal-to- 
noise  ratios. 

a.  High  Signal-to-Noise  Ratio 


In  this  case,  the  linear  term  V2(S,N)  of  Equation  A- 12  predominates, 
and  we  may  write 


2  S 


in 


(S/N)out  = 


2V2‘flriV2Sin 

b.  Low  Signal-to-Noise  Ratio 


=  0.5  (S/N) 


in 


(A- 21) 


The  quadratic  term  V^N)  of  equation  (A-9)  predominates  in  this  case, 

and 

2  Sm2  2 

(S/N)out  =  - -  =  0.  5  (S/N)in  (A- 22) 

4  o-i2 

In  general,  any  practical  system  will  have  a  high  signal-to-noise  ratio, 
and  Equation  (A- 21)  will  apply.  Thus  it  can  be  expected  that  few  signal-to-noise 
ratio  problems  should  arise  in  gradient  detection. 

Since  different  system  concepts  may  be  necessary  to  utilize  the  two  outputs 
most  effectively,  it  would  be  presumptious  to  attempt  an  absolute  comparison 
between  Laplacian  and  gradient  magnitude  in  terms  of  imagery  screening  capabilit 
However,  so  little  loss  in  signal-to-noise  ratio  results  from  gradient  magnitude 
detection,  and  because  simplified  screening  logic  may  be  used  with  this  type  of 
preprocessing,  the  gradient  process  appears  more  attractive  at  present. 
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APPENDIX  B 


EQUIVALENCE  OF  SPATIAL  FREQUENCY 
FILTERING  AND  CROSS  CORRELATION  TECHNIQUES 


The  coherent  light,  optical  spatial  filtering  technique  provides  an  exact 
two-dimensional  analog  of  conventional  passive  filtering  in  the  one -dimension¬ 
al  time  domain.  Due  to  the  apparent  generality  of  the  technique,  it  is  some¬ 
times  assumed  to  be  superior  in  its  capabilities  to  the  techniques  that  work 
in  the  spatial  domain,  such  as  the  lensless  correlographs  and  scanner,  . 
tapped  delay-line  cross  correlators.  In  fact,  these  spatial  domain  tech¬ 
niques  are  completely  equivalent  to  optical  spatial  frequency  filtering.  The 
mathematical  statement  relating  the  spatially  filtered  output  ?(x,  y)  to  the 
raw  input  r  (x,  y)  in  the  ideal  coherent-light,  optical  spatial  filter  is  as 
follows. 

f(x,y)=/-1  J.F  (wx«“y)j  (B-l) 

F  (wx’  “y^  =  R  (wx  >  wy)  ‘  G  (wx  >  wy)  (B-2) 

H(wx  ,  wy)  [(h(x,  y)  ]  (B-3) 

where  G  (  w  ,  w  )  is  the  filter  function.  In  general,  r(x,  y)  will  be  real,  but 
x  y 

the  other  functions  may  be  complex,  i.e. , 


f  (x,  y)  =  fr  (x,  y)  +  j  f.  (x,  y)  (B-4) 

F  (w  ,  to  )  =  F  (to  ,  w  )  +  j  F.  (w  ,  to  )  (B-5) 

x  y  y  x  *  y  i  x  y 

G  (w  ,  w  )  =  G  (to  ,  w  )  +  j  G.  (w  ,  to  )  (B-6) 

x  y  j#  x  y  x  x  y 

R  («x,  toy)  =  Rj.  (wx,  w.  1  +  j  Ri  (wx,  “y)  (B-7) 

and  f  (x,  y)=f  (x,  y).  (B-8) 


r 

Note  that  in  general,  G  (wx,  toy)  has  an  inverse  transform 
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g  (x,  y)  =  ¥  " 1  G(«x,  wy>]  =  gr(x,  y)  +  j  g^x,  y). 

The  output  f  (x,  y)  can  be  expressed  equally  well  by  the  expression 


£  (x,  y)  =  r  (x,  y)  *  g  (x,  y) 


(B-9) 


/*»  1  AV 


where  *  is  the  symbol  for  convolution,  i.e. , 


?  [r(x,  y)  *  g  (x,  y)j  =  R  (wx,  Wy)  '  G  (wx,  wy) 


(B-ll) 


The  complete  expression  of  the  convolution  is 


f  (x,  y)  =  r  (x,  y)  *  g  (x,  y) 


-  J7  x 


(x  -  g,  y  -  rtf  g  (g,  q)  dg  d^ 


{B- 12) 


Equation  B-12  is  the  expression  for  the  cross -correlation  of  r(x,  y)  and  g(x,  y), 
which  is  the  operation  performed  by  the  lensless  correlograph  and  by  the 
scanner-delay  line  cross-correlator  techniques. 

Throughout  the  discussion,  the  variable  g  (x,  y)  has  been  assumed  to  be 
complex.  However,  it  should  be  noted  that  this  does  not  limit  the  applicability 
of  the  cross -correlator  techniques  which  work  only  with  real  functions.  The 
complex  function  g’  (x,  y)  can  be  represented  by  the  sum  of  a  real  function  and 
an  imaginary  function. 


g  (x,  y)  =  gr  (x,  y)  +  j  gi  (x,  y). 


(B- 13) 


The  correlation  of  r(x,  y)  with  gr(x,  y),  and  of  r(x,  y)  with  (x,  y),  can  be 
carried  out  separately  (but  simultaneously  in  the  delay  line  cross-correlator 
device).  Thus,  the  same  information  is  available  at  the  cross-correlator 
output  as  is  available  in  the  complex  output  of  the  coherent  light  optical 


!•  gr  *  ¥~l  [Gr(«x,  wy)]  and  gi  *  jf’1  [Gj  (w*,  wy)] 
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spatial  filter.  Because  of  the  nature  of  practical  light  sensors,  the  output 
from  the  optical  spatial  filter  device  will  be  measured  in  terms  of  quadratic 
content 

I  f  (x,  y)  j  2  =  f  (x,  y)  f  (x,  y)  =  fr2  (x,  y)  +  fj2  (x,  y) .  (B- 14) 

With  the  cross -correlation  techniques,  the  real  and  imaginary  components 
can  be  treated  separately. 

The  foregoing  discussion  has  been  carried  out  in  terms  of  complex  aper¬ 
ture  functions  in  order  to  show  completely  general  equivalence  between 
coherent- optical  spatial  filtering  and  cross-correlation.  As  a  practical 
matter,  however,  useful  apertures  for  spatial  filtering  are  almost  always 
pure  real  (no  imaginary  component).  For  mechanical  convenience,  investi¬ 
gators  in  coherent  optical  systems  often  use  unsymmetrical  filter  functions 
in  «  which  result  in  complex  apertures;  however,  there  is  no  evidence  that 
the  use  of  complex  apertures  in  the  spatial  domain  provides  any  additional 
utility  over  pure  real  apertures. 
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APPENDIX  C 


EQUIVALENCE  OF  TWO  TEXTURE  MEASUREMENT  TECHNIQUES 


In  this  appendix,  the  equivalence  of  two  methods  of  measuring  "textural" 
features  is  demonstrated.  In  the  first  method,  the  output  <{>  for  a  particular 
texture  measurement  is  obtained  >*y  applying  a  linear  discriminant  function  to 
the  components  of  the  Fourier  power  spectrum  of  the  image.  In  the  second 
method,  the  input  image  is  cross-correlated  with  each  of  two  apertures,  these 
correlation  functions  squared,  and  the  difference  of  these  squares  integrated 
over  the  entire  frame,  yielding  the  output  <(>.  It  is  shown  that  with  the  appro¬ 
priate  choice  of  aperture  functions,  the  two  expressions  for  <J>  are  equivalent. 


Let  h(x,  y)  be  the  input  image,  and  let  B(wx,  toy)  be  its  Fourier  transform, 
and  B-Cwjj;,  «y)  •  B*  (wx,  Wy)  the  Fourier  power  spectrum.  0(00,,.,  «„)  is  a  two 
dimensional  linear  discriminant  function  which  will  operate  on  |  «y)  |  2. 

oc  sc 

4*  =f  J  D(“x>  «y)  Bfo'x-  wy)  B*  (wx.  wy)  do)y  (C-l) 


Define  two  real  non-negative  functions 


Dp  ®y)  = 


and 


D.JSJ  <*>y) 


wy)  when  D(wx;,  Wy)  >  0 
0  otherwise 

-  D(wx,  Wy)  when  D(wx,  ojy)  <  0 
0  otherwise 


(C-2) 


(C-3) 


where 


Dp  (a*.  wy)  -  DN(wx,wy)  =  D(wx,wy) 


(C-4) 
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In  addition,  define  two  aperture  functions 


»i (*■» y )  -r  ~l  ^/Dp{^,cyyl 


i 

j 


»2  (*.y>  =■? 


-1 


"^dN  (ux>  “y) 


(C-5) 

(C-6) 


Then  let 


f^x,  y)  =  W  j(x,  y)  *  b.(x,  y)  =  jf 


VDp^Wy)  •  c^) 


(C-7) 


and 


f2(x>  y)  =  W2(*.  y)  *  h(x,y)  =  ^ff"1  j^VD^Wjt,  Wy)  •  BlWjj-.Wyjj.  (C- 

The  total  quadratic  content  (proportional  to  power)  of  any  function  can  be  ob¬ 
tained  by  integrating  over  the  function  or  over  its  transform. 


j  r  so  /■  oo  s”0  r  00 

J  J  fl  (x-y)  ^  dy  ~  J  J  Dp(wx.  wy)  »(«*.  »y)  B*(Wjc,  Uy)  dtt^dcOy 


8) 


4ir 


-00  “00 


(C-9) 


and 


,.00-00  -00  -00 

4.W  J  J  f22(x,y)  dx,  dy  =  J  J  DN(wx.  wy)  ^(“x.  “y)  ay)  du*  dw, 


-00  -00 


y 

(C-10) 


The  expression  for  (j>  can  be  obtained  from  the  combination  of  Equations  (C-4), 
(C-9)>  and  (C-10)  in  Equation  (C-l ). 


4*  ~  J  [  ^pt^^x*  wy)  “  ^N^wx»  wy)  ]  ®(wx>  wy)  B*  wy)  dcox  dtoy  (C- 


11) 


.00  -  00 


.00  .OO 

<t>  =  4tt2J  J  ^fj2  (x,y)  -  f22  (x,  y)  j  dx  dy 


(C-12) 


-oo  -oo 


02 


Note  that  for  perfect  equivalence,  the  integrals  have  to  be  over  the 
range  -«o  to  »,  However,  in  fact,  the  image  data  of  interest  ^vill  be  limited 
to  a  particular  frame  area,  with  the  remainder  of  the  range  blacked  out  and 
the  aperture  functions  will  be  defined  over  a  much  smaller  area, 

<j>  =  4irZJ  j  [  fj2  (x,  y)  -  f22  (x,  y)  j  dx  dy  (C-13) 

frame 
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APPENDIX  D 


OPTICAL  EFFICIENCY  OF  THE  LENSLESS  CORRELOGRAPH 


In  order  to  evaluate  the  performance  of  sensors  for  use  in  the  image 
plane  of  the  lensless  correlograph,  it  is  necessary  to  determine  the  light 
flux  density  at  the  image  plane  (illuminance,  E)  that  will  result  when  the 
peak  value  (B)  of  the  luminous  emittance  (L)  is  specified  at  the  object  plane. 

For  this  derivation,  assume  that  the  input  photograph  is  a  Lambert  sur¬ 
face.  The  luminous  emittance  of  any  point  of  the  photograph  is 


L(x,  y)  =  BftQ  (x»y)» 


where  B  is  a  constant  corresponding  to  the  emittance  at  peak  white,  and 
H  (x»  y)>  0  «£  £.1,  is  a  function  of  location  in  the  image.  Consider  the 

light  from  a  single  square-picture  element  of  area  a  passing  through  an 

aperture  with  transmittance  \x  ,  0  <_n  <.1,  and  impinging  on  a  point  in  the  image 

space  (see  Figure  D-l).  The  illuminance  at  that  point  from  this  picture  ele¬ 
ment  area  is  given  by 


AE 


2 

2  it  d 


2 


2 

2  ir  d 


(D-l) 


The  total  illuminance  at  a  point  in  the  object  plane  due  to  the  cross -corre¬ 
lation  of  a  input  photograph  with  a  weighted  aperture  is  given  by  the  following 
summation  over  all  the  m  elements  of  the  aperture: 


B  a 


E  = 


2  it  d 


m 

V 

1 


*0 


Let 


M  = 


1 

m 


m 

E 

l 


M’o  ^a  * 


(D-2) 


(D-3) 


D-l 


then 


E 


2 

mu  B  a 
_ o 


(D-4) 


2  it  d 


If  the  input  photograph  is  a  square, 


VA0  units  along  a  side,  then 


(B-5) 


is  the  total  number  of  picture  elements,  and 
m/i  B  A 

E  =  - -  .  (D-6) 

n  2  ir  d 


There  is  a  'rule  of  thumb'  for  optical  systems  designed  to  eliminate  general 
aberrations  that  reduce  image  quality  and  resolution,  which  states  that  the 
half  angle  described  by  the  object  with  respect  to  a  point  on  the  optical  axis 
in  the  aperture  plane  should  be  less  than  ten  degrees. 


Therefore,  let 

\J — a  /r  %  tan  10°, 
2 

which  yields 

r2  *  8  A 

o 

Then 


(D-7) 


P-8) 


E 


2 

m/x  B  r 


n  16  it  d 


P-9) 


If  an  output  sensor  element  has  a  linear  dimension,  a^  ,  and  the  geome¬ 
try  of  the  correlograph  is  fixed  to  match  the  resolution  of  the  sensor  to  the 
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ai 

input,  then  M  =“ — ,  and 


_r_  1 

d  1  +  M 


a.  +  a 
1  o 


then 


E 


m/i  B 
n  16  it 


m_i  n\ 

~  i 


(D-l  1) 


Consider  the  distribution  of  Q  and  M  a  to  be  uniform  between  0  and  1. 
Then  the  expected  values  of  these  terms  and  their  squares  are 


£  i*,1 

=  =  i/2  and 

(D-12) 

=  =  J/3; 

(D-13) 

now 

m 

1  srn 

=  —  Z  \ 

m  Y 

(D- 14) 

which  is  a  sample 

estimate  of  (mo  Ha).  and  has  the  mean  value> 

"  £-  (flo 

(D-15) 

If  the  picture  and 

aperture  are  uncorrelated, 

then 

*  £(1*0  "a1  =  £/*o’ 

<S(M  )  =  1/4  (D-16) 

a 
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On  the  other  hand,  if  in  some  region,  the  two  are  perfectly  correlated,  i.  e.  , 
Mo  B  Ma,  then 


K 


j  -  F 2\ 

/  -  i>*n  / 


»  t 


.(D-17) 


By  far  the  most  common  situation  will  be  where  the  object  and 
aperture  are  uncorrelated.  Therefore,  the  average  value  of 
illuminance  will  be 


(D-18) 


APPENDIX  E 


SURVEY  OF  METHODS 
FOR  SCANNING  PHOTOGRAPHIC  DATA 


1.  Scanning  System  Requirements  ,  ,  -  .  •  . 

A  scanner  is  required  that  will  accurately  and  rapidly  convert  photo¬ 
graphic  data  into  a  video  signal  suitable  for  input  to  imagery  screening 
circuitry*  The  full  state-of-the-art  capability  must  be  utilized  to  obtain 
an  optimum  balance  among  (1)  resolution,  (2)  speed  of  operation,  (3)  signal 
to  noise  ratio,  (4)  flexibility,  and  (5)  cost. 

The  resolution  capability  required  of  a  scanning  device  for  aerial  photo¬ 
graph  imagery  is  dictated  by  several  factors,  among  them: 

a.  size  of  film 

i  ^  ... 

b.  resolution  of  film 

c.  ground  resolution  required 

d.  ground  area  to  be  covered,  and 

e.  scanner  limitations. 

Giving  due  consideration  to  all  of  these  factors,  it  has  been  concluded 
that  a  5000  TV  line -capability  is  sufficient  for  imagery  screening  operations. 
Thus  an  element  size  of  one  foot  may  be  obtained  on  a  photograph  covering 
5000  feet  by  5000  feet.  An  element  size  of  one  foot  appears  sufficiently  small 
to  permit  recognition  of  military  vehicles  and  other  objects  of  comparable 
size. 


a.  Resolution 


Scanning  of  5000  elements  in  a  single  direction  requires  a  highly 
refined  scanning  system  in  terms  of  resolution  capability.  It  would  be 
desirable  to  scan  the  5000  elements  in  a  single  scan;  however  the  possibility 
of  duplicating,  scanning  equipment  for  parallel  operation  may  also  be  con¬ 
sidered. 
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b.  Speed  of  Operation. 


Rapid  screening  of  photographic  data  imposes  stringent  speed  re¬ 
quirements  on  the  scanner.  It  is  desirable  to  process  an  entire  25  x  10^- 
element  frame  in  a  matter  of  a  few  seconds.  Since  ordinary  scanners  in 
current  use  operate  below  six  megacycles,  approximately  eight  seconds 
would  be  required  to  scan  a  data  frame.  It  would  be  desirable  to  reduce 
this  time  requirement. 

c.  Signal-to -Noise  Ratio 

The  signal-to-noise  ratio  requirement  is  dictated  by  the  number  of 
distinct  levels  to  be  discriminated.  An  adaptation  of  the  channel  capacity 
formula  gives  the  relation  between  signal-to-noise  ratio  and  number  (L)  of 
distinct  levels,  i.  e. , 


L  =  (1  +  S/N}1/2 


(E-l) 


where  L  is  the  number  of  levels,  S  is  the  mean  signal  power,  and  N  is  the  noise 
power.  For  large  signal-to-noise  ratios,  this  equation  reduces  to 


L 


h 

In 


(E-2) 


where  Ig  is  rms  signal  current  and  In  is  rms  noise  current.  Thus,  the 
number  of  levels  available  is  numerically  equal  to  signal-to-noise  ratio 
properly  defined. 


The  availability  of  a  given  number  of  quantization  levels  determines 
the  minimum  signal-to-noise  ratio  that  will  justify  use  of  all  levels.  Signal- 
to-noise  ratio  of  scanning  equipment  is  normally  given  in  terms  of  peak 
current  to  the  noise  generated  at  that  current,  i.  e. , 


I 

’ll 


Amax 


zVTig 

In  max 
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Since  In  (considered  to  be  generated  at  a  photosensitive  surface)  increases 

as  the  square  root  of  signal  current,  its  value  at  the  mean  signal  level  is 

,-l/2,  -  ,  T  ^l/2  _ 

2  In  max,  i.e. ,  In  max  =  2  In  . 


I 


I, 


(E~4) 


2V2Ia 


max 


2 - 

*n 


I 


I 

max 


2  L  . 
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Thus,  for  64  level  (6  bit)  quantization,  the  indicated  signal- to-noise  ratio 
should  be 


I 


*n 


I 

max 


2  x  64 


128  . 


(E-6) 


A  128  to  1  (42  db)  signal-tc-noise  ratio  is  necessary  to  justify  six-bit  quanti¬ 
zation. 


d.  Scan  Pattern  Flexibility 

imagery  screening  applications  may  necessitate  the  use  of  area 
scan  and  other  unusual  patterns.  The  area  scan  pattern  may  take  the  form 
of  a  60- element  ribbon  scan  moving  horizontally  the  full  width  of  the  scanner 
screen. 


A  60-element  ribbon  height  is  required  for  registry  testing  of  30- 
element  square  areas.  For  one  foot  ground  resolution  element  size,  a  30- 
foot  by  30-foot  ground  area  is  examined  in  the  correlator;  this  permits 
recognition  of  military  vehicles  (typically  20  to  30  feet  long)  and  objects  of 
comparable  size  in  a  single  parallel  operation.  Testing  of  all  possible  30- 
element  square  arrays  within  a  frame  is  facilitated  by  providing  50  percent 
vertical  overlap,  necessitating  a  60-element  high  ribbon  scan. 

This  overlap  requirement  results  in  unequal  intervals  between 
successive  scans  over  any  picture  element;  this  is  an  important  factor  in 
the  evaluation  of  storage  type  scanning  devices.  It  should  be  mentioned  that 
overlap  scanning  reduces  the  effective  element  rate  capability  of  a  scanning 
device. 
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e.  Low  Cost 


While  this  program  is  directed  toward  future  equipment  development, 
it  is  desirable  to  determine  capabilities  on  both  the  long  and  short  term.  For 
initial  development  equipment,  certain  sacrifices  may  be  made  to  hold  cost  at 
a  minimum.  However,  a  view  must  be  taken  of  future  costs  of  more  sophisti¬ 
cated  equipment,  and  of  possible  cost  reductions  due  to  refinement  of  manu¬ 
facturing  techniques.  More  emphasis  will  be  placed  in  this  section  on  present 
costs,  as  they  may  be  more  accurately  estimated. 

2.  Candidate  Scanning  Devices 


a.  Camera  Tubes 


There  are  two  fundamental  types  of  camera  tubes  available:  storage 
types  and  non-storage  types.  Storage  tubes  depend  upon  the  integration  of 
photo-electronic  effects  during  the  time  when  no  read-out  is  taking  place.  Non¬ 
storage  types,  on  the  other  hand,  utilize  these  effects  only  during  the  read¬ 
out  interval.  It  should  be  noted  that  a  constant  storage  interval  is  necessary 
in  storage  tubes,  whereas  a  constant  read-out  interval  is  required  in  non¬ 
storage  types. 

Several  camera  tubes  have  been  introduced  since  the  advent  of  tele¬ 
vision,  but  only  two  or  three  of  these  have  undergone  continued  development. 
The  image  orthicon  has  been  refined  to  a  high  degree  of  reliability  for  use  in 
television  cameras,  and  the  vidicon,  because  of  its  simplicity  and  small  size, 
has  been  developed  for  industrial  television  purposes.  Both  of  these  fall  into 
the  storage  tube  category.  The  most  interesting  non-Btorage  type  is  the  image 
dissector,  which  is  used  for  special  purpose  applications.  A  description  of 
each  of  these  tubes  follows. 

(1)  Image  Orthicon 

The  image  orthicon  combines  an  image  intensifier  section  and 
a  storage  tube  section.  In  addition,  photomultiplication  is  accomplished  in 
the  same  envelope.  The  combination  results  in  a  camera  tube  that  has  high 
sensitivity  and  an  output  current  capable  of  overriding  resistor  and  amplifier 
noise. 


At  the  present,  image  orthicons  are  available  with  900  TV  line 
resolution.  There  are  development  programs  under  way  to  increase  this  to 
1500  or  more  TV  lines.  There  is  some  tendency  toward  burnt-in  images  on 
the  image  orthicon,  but  there  is  no  theoretical  lag  to  the  signal.  However,  ~ 
there  seems  to  be  little  work  in  progress  on  utilizing  the  image  orthicon  for 
high  frame  rates. 
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(2)  Vidicon 


The  vidicon  is  a  simple  photoconductive  camera  tube  which 
can  be  made  small  in  size  while  still  retaining  high  resolution.  The  highest 
limiting  resolution  available  in  a  vidicon  is  about  1200  TV  lines.  No  known 
development  is  in  progress  for  vidicons  of  higher  resolution. 

Vidicon  frame  rate  is  limited  by  the  lag  characteristic  of  the 
photoconductive  target.  The  fastest  decay  time  of  present  vidicons  is  about 
0.  1  second  (for  10%  retention);  the-  data  rate  is  limited  by  this  lag  characteris¬ 
tic. 


(3)  Image  Dissector 


This  camera  tube  is  of  interest  as  an  example  of  the  non¬ 
storage  type.  Basically,  it  operates  on  the  image  intensifier  principle. 

A  photocathode  emits  electrons  which  are  accelerated  toward  the  rear  of  the 
tube.  A  pinhole  aperture  at  the  rear  of  the  tube  allows  only  a  small  stream 
of  electrons  to  pass  to  the  photomultiplier  section.  Deflection  coils  are  used 
to  shift  the  electron  image  to  every  possible  condition  of  registry  with  respect 
to  the  aperture.  Resolutions  of_about  3000  TV  lines  are  available  in  these 
tubes,  but  adequate  resolution,  bandwidth,  and  signal-to-noise  ratio  cannot 
be  obtained  simultaneously.  Image  dissectors  may  be  designed  for  particular 
applications,  and  are  the  nearest  competitor  in  resolution  capability  to  the 
flying  spot  scanner. 

b.  Flying  Spot  Scanner 


A  reliable  device  for  scanning  transparencies  may  be  constructed 
by  imaging  a  cathode  ray  tube  raster  onto  the  transparency  and  collecting  the 
transmitted  light  in  a  photomultiplier.  Alternatively,  a  line  scan  may  be  used, 
with  mechanical  shifting  of  the  transparency  to  provide  scan  in  the  remaining 
dimension.  High  quality  flying  spot  scanning  requires  a  special  tube  with  high 
resolution,  high  intensity,  and  low  phosphor  grain  noise.  A  line  scan  tube  can 
be  constructed  of  higher  quality  than  a  raster  scan  tube  as  only  one  dimensional 
correction  is  required. 

(1)  Conventional  Scanners 


The  conventional  flying  spot  scanner  utilizes  a  high  resolution 
CRT  operating  at  high  intensity;  this  is  nece^s^ry  to  overcome  photomulti¬ 
plier  noise.  A  single  transparency  may  be  scanned  repeatedly,  or  a  new  one 


shifted  into  position  after  each  scan.  There  is  necessarily  some  time  lost 
during  the  mechanical  shifting  operation.  There  are  flying  spot  scanners 
that  operate  reliably  with  in  excess  of  2000  TV  line  resolution.  The  problem 
of  mechanically  shifting  :he  transparency  between  frames  is  of  such  magnitude 
that  an  alternative  should  be  sought. 

(2)  Line  Scanner 


Either  a  conventional  scanner  tube  or  a  special  purpose  line 
scan  tube  may  be  used  to  generate  a  single  line  scan.  With  single  line  scan, 
continuous  mechanical  motion  is  necessary  to  provide  vertical  scanning. 
Continuous  motion  has  the  advantage  of  no  loss  in  time  between  frames,  arid 
is  simpler  mechanically. 

The  chief  disadvantage  of  line  scan  iB  that  repeated  scanning 
of  the  same  line  on  the  phosphor  tends  to  cause  heating,  and  possible  damage 
to  the  phosphor  or  breakage  of  the  tube.  Thus,  line  brightness  should  not 
exceed  the  frame  brightness  limit  given  for  frame  scan. 

(3)  Rotating  Phosphor  Tube  Scanner 

In  order  to  retain  the  advantage  of  line  scan,  and  at  the  same 
time  increase  brightness  capability,  a  rotating  drum  phosphor  may  be  used. 
Continuous  rotation  of  the  drum  permits  cooling  of  one  portion  of  the  phosphor 
while  another  portion  is  being  scanned.  The  brightness  capability  of  such  a 
system  is  about  30  times  greater  than  with  conventional  line  scan. 

A  rotating  drum  tube  is  available  that  is  capable  of  a  limiting 
resolution  of  up  to  9000  TV  lines,  and  which  utilizes  a  fast  P-16  phosphor. 
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APPENDIX  F 


FLYING  SPOT  SCANNER  FOR  HIGH  SIGNAL -TO -NOISE  RATIO  AND 

HIGH  RESOLUTION 


1 .  Introduction 

This  appendix  examines  a  flying  spot  scanner  of  the  type  required  to  scan 
large  high  resolution  transparencies  for  imagery  screening.  The  purpose  of 
this  examination  is  to  determine  whether  desired  resolution  and  signal -to- 
noise  ratio  performance  may  be  realized  in  flying  spot  scanners  of  conven¬ 
tional  design. 

A  functional  diagram  of  the  scanner  considered  is  given  in  Figure  F-l. 

The  purpose  of  the  auxiliary  channel  shown  in  this  figure  is  to  provide  phos  - 
phor  cancellation,  i.  e.,  to  divide  out  any  signal  variations  due  to  phosphor 
grain,  phosphor  fatigue,  or  other  sources  of  non -uniformity  prior  to  the  first 
lens.  Division  is  accomplished  by  logging  the  outputs  of  both  channels  and 
obtaining  the  difference.  This  difference  is  taken  as  the  scanner  output  (in¬ 
stead  of  taking  the  inverse  log). 

2.  Flying  Spot  Cathode  Ray  Tubes 

Flying  spot  cathode  ray  tubes  capable  of  2500  TV  line  resolution  on  a 
4- 1/2 -inch  diameter  screen  are  available  from  several  manufacturers.  These 
tubes  are  available  with  fast  (P-16)  phosphors  to  satisfy  element  rate  require¬ 
ments  of  up  to  30  x  10^  elements /second*. 

Power  input  to  a  flying  spot  scanner  may  be  taken  as  the  product  of  cathode 
ray  beam  current  and  accelerating  potential.  Typical  maximum  beam  current 
is  about  10  microamperes  for  most  high  resolution  tubes  operating  with  re¬ 
duced  raster  size;  recommended  accelerating  potentials  are  between  8  and  27 
kilovolts. 

Useful  light  output  from  the  cathode  ray  tube  screen  depends  upon  phos¬ 
phor  efficiency,  which  is  typically  2  percent  for  P-16  operating  at  high  resolu¬ 
tion.  This  is  the  ratio  of  power  emitted  within  the  P-16  spectral  response 
curve  to  total  beam  power.  The  spectral  power  distribution  characteristic  is 
an  important  consideration,  since  it  must  be  utilized  in  determining  lens  trans¬ 
mission  and  photomultiplier  response. 


*  Light  output  from  a  P-16  phosphor  decays  to  10  percent  in  about  0.  1  micro¬ 
seconds,  therefore,  little  degradation  due  to  persistence  is  expected  for  video 
bandwidths  below  5  megacycles. 
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3.  Lens  System 


The  lens  system  illustrated  in  the  functional  diagram  (Figure  F-l)  com¬ 
prises: 

a.  The  objective  lens 

b.  A  beam  splitter 

c.  Main  channel  condenser 

d.  Cancellation  channel  condenser 

One-to-one  magnification  is  assumed  to  result  in  a  scanned  region  about 
4-1/2 -inches  long  on  the  transparency.  Because  image  brightness  is  impor¬ 
tant  in  signal -to -noise  ratio  calculations,  it  is  desirable  to  utilize  an  objective 
lens  of  as  large  f/number  as  possible;  typical  lenses  for  this  application  range 
from  f/2  to  f/8.  While  the  resolution  capability  of  the  objective  lens  is  an 
important  consideration,  it  is  assumed  here  that  5000  element  resolution  is 
available  in  an  f/3.  5  lens.  Another  important  property  of  the  objective  is  its 
spectral  transmission  characteristic.  Since  the  P-16  phosphor  peaks  at  the 
violet  end  of  the  visual  range,  a  lens  optimized  in  the  visual  range  would  have 
poor  total  transmission  of  the  P-16  emission.  For  efficient  transmission, 
therefore,  it  is  necessary  to  utilize  a  lens  optimized  for  the  P-16  characteris¬ 
tic. 


An  objective  will  intercept  only  a  small  percentage  of  the  light  emitted 
from  the  CRT  screen;  the  actual  ratio  of  collected  power  to  total  power  may 
be  determined  from  the  formula 


**c 

4f2  (m  +  l)2 

where  jjc  is  collection  efficiency 
f  is  the  lens  f  number 
m  is  magnification  ratio. 
For  m  =  1, 


1 


(F-l) 


(F-2) 
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A  beam  splitter  is  used  immediately  behind  the  objective  lens  to  direct 
some  light  into  the  cancellation  channel.  About  5  percent  of  the  light  should 
take  this  path,  and  the  remainder  should  continue  in  its  normal  path  to  form  a 
raster  image  on  the  transparency.  Condensing  optics  are  utilized  to  form  ob¬ 
jective  aperture  images  on  both  photomultipliers.  It  is  important  that  all 
optical  components  transmit  efficiently  over  the  P-16  phosphor  spectrum.  A 
50  percent  typical  optical  system  loss  may  be  assumed. 

4.  Photographic  Data 

Typical  photographic  transparencies  have  densities  ranging  between  0.  2 
and  2.  0;  corresponding  transmission  limits  are  about  0.  60  to  0.  01.  It  is  dif¬ 
ficult  to  obtain  a  good  signal -to -noise  ratio  on  high-density  transparencies, 
therefore,  it  is  advisable  to  ignore  areas  with  transmission  lower  than  0.01, 
for  example.  This  may  be  considered  as  the  black  level. 

5.  Photomultipliers 


Linear  photomultipliers  (such  as  the  "Venetian  blind"  variety)  are  required 
for  effective  phosphor  cancellation.  The  8051  photomultiplier  is  a  good  example 
of  this  type:  it  has  an  S-ll  spectral  response  that  closely  matches  the  P-16 
phosphor  spectrum,  and  has  high  cathode  sensitivity  of  0.  06  amperes  per  watt 
at  its  peak.  This  tube  also  has  reasonably  uniform  sensitivity  over  the  photo¬ 
cathode  surface,  an  important  consideration  when  phosphor  cancellation  is 
necessary. 


Photocathode  sensitivity  (rj  to  P-16  radiation  is 

.  _iv 


^k 


l 


,  (X)  f2(\)  dx 

fl{\)  d\ 


where:  r?p  = 
f  !  (X)  = 
f2(x)s 


0.06  amperes/watt  =  peak  photomultiplier  sensitivity 
P-16  spectral  characteristic 
S-ll  photocathode  characteristic. 


(F-3) 


This  calculation  has  been  performed  graphically,  giving 

ni-  *  0.85  ry  ,17-4) 

indicating  a  very  good  spectral  match. 
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The  photocathode  is  the  dominant  source  of  noise  in  any  flying  spot  scanner; 
in  fact,  useful  computations  may  be  made  by  ignoring  all  other  noise  sources. 
The  shot  noise  formula 

Ns  =  Y~2ef"ik  (F-5) 

is  utilized  to  determine  photocathode  noise  as  a  function  of  current.  In  Equation 
F-5 


N  =  RMS  noise  current 
s 

e  s  charge  ox  an  electron  =  16  x  10”^®  coulombs 
f  =  video  bandwidth  in  cycles  per  second 
i  =  photocathode  current. 

K 

Photomultiplication  raises  the  signal  and  noise  levels  linearly  without  in¬ 
troducing  a  significant  amount  of  noise,  and  provides  an  output  current  suffi¬ 
ciently  high  to  override  resistor  and  tube  noise  in  succeeding  circuitry, 

6.  Logging 

Logging  accomplishes  two  functions  in  the  scanner  under  consideration. 

It  permits  pnosphor  cancellation,  and  it  introduces  a  non-linearity  which 
causes  output  noise  to  remain  more  nearly  constant  as  the  signal  output 
varies.  This  second  advantage  is  more  apparent  when  quantization  of  the  out¬ 
put  signal  is  necessary. 

Phosphor  decay  compensation  may  be  applied  to  both  channels  prior  to 
logging;  the  two  circuits  should  be  linear  and  identical. 

7.  Differential  Amplifier 

This  circuit  merely  subtracts  cancellation  channel  output  from  main  chan¬ 
nel  output  to  produce  a  signal  free  of  phosphor  grain  noise  and  the  other  defects 
mentioned  earlier. 

8.  Signal  and  Noise  Considerations 

Photomultiplier  signal -to -noise  ratio  may  be  calculated  easily  as 


If  very  high  signal -to -noise  ratios  are  assumed,  logging  results  in  an  output 
that  is  the  logarithm  of  signal  input  plus  noise  of  the  same  form  as  at  the  input 
but  changed  in  magnitude  {see  Figure  F-2).  Noise  amplitude  is  multiplied  by 
the  slope  of  the  log  characteristic.  To  illustrate  this  process,  let  the  output 
current  of  a  logarithmic  device  be  represented  as 

i'  =  k  log  i.  (F-7) 


Then 


n<  «  k  n  *  lQ8  1  =  JL  n.  (F-ft) 

d  i  i 

It  is  next  required  to  determine  what  happens  to  signal  and  noise  in  the 
differential  amplifier.  Assume  that  subtraction  is  performed  without  loss  or 
amplification.  The  inputs  to  the  difference  process  are:  main  channel  log 
signal  plus  noise  (assumed  Gaussian),  and  cancellation  channel  log  signal  plus 
noise  (assumed  Gaussian).  The  second  is  subtracted  from  the  first  to  obtain: 

I'm  -  i’c  -  k  (log  im  -  log  y  =  k  log  (y/y  (P-9) 

-V"'™2  +  ”'c2  •  )2  +  )2  -  (P-10> 

Note  that  the  same  log  characteristic  was  utilized  in  main  and  cancellation 
channels. 

9.  Representative  Performance  Calculations 

Figure  F-3  represents  in  block  form  the  operations  that  are  performed 
in  a  flying  spot  scanner  with  phosphor  grain  cancellation.  The  functions  fol¬ 
low  closely  those  enumerated  in  the  foregoing  sections;  exceptions  are; 

(1)  placement  of  the  spectral  match  factor  block  immediately  after  the  phos¬ 
phor  efficiency  block;  this  makes  separate  computations  for  each  photo¬ 
multiplier  unnecessary;  and  (2)  neglect  of  light  loss  in  the  main  channel  due 
to  beam  splitting;  this  loss  is-  lumped  with  the  assumed  lens  system  losses. 

The  following  parameters  are  assumed  for  the  representative  system; 


cathode -ray  accelerating  potential 

27  KV 

beam  current 

10  pa 

phosphor  efficiency 

2  % 
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Figure  F-2  Noise  in  Logarithmic  Amplification 


0.0!  TO  0*0 
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Figure  F-3  Block  Diagram  of  Essential  Scanner  Functions 


spectral  match  (fixed) 

objective  collection  efficiency  (f/3.  5  lens) 
lens  system  losses 

portion  of  light  to  cancellation  channel 
transmission  range  of  film 
photocathode  efficiency 


85  % 

0.005 
50  % 

5  % 

1  %  to  60  % 
0. 06  a/w 


video  bandwidth  5  me* 

Utilizing  these  figures,  cathode  ray  beam  power  is  27  x  103  x  10  x  10'6 
=  0.27  watts;  effective  light  output  is  two  percent  of  this  or  5.4  milliwatts. 

In  terms  of  photomultiplier  response,  85  percent  of  this  power  is  usable,  re¬ 
sulting  in  an  effective  radiant  screen  output  of  approximately  4.  6  milliwatts. 
The  amount  of  radiation  intercepted  by  the  objective  lens  is 


W 


o  _ 


4-6  x  If3 
16  (3.5)2 


23.  5  microwatts. 


(F-ll) 


Since  50  percent  of  this  power  is  assumed  to  be  lost  in  the  optical  system, 
available  main  channel  radiant  power  to  the  film  is  11.8  microwatts,  and 
Wm  =  11.  8  r  microwatts  are  delivered  to  the  main  channel  photocathode 
(where  t  is  film  transmission).  Power  delivered  to  the  cancellation  channel 
photomultiplier  is  five  percent  of  11.8  microwatts,  i.  e.,  Wc  =  0.590  micro¬ 
watts. 


The  photomultiplier  conversion  efficiency  of  0.  06  amperes  per  watt  gives 
corresponding  signal  currents  of 


im  =  o:Q6  x  11.8  t  =  0.708  r 


ic  =  0.06  x  0.59  =  0.0354  jia. 


(F-12) 

(F-13) 


*  Phosphor  decay  compensation  is  assumed  to  have  a  negligible  effect  on 
performance  calculations  for  a  P-16  phosphor  at  5  megacycles  video  band¬ 
width. 
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The  shot  noise  formula  (Equation  F -5)  gives  JRMS  noise  current  at  each 
photocathode  as  a  function  of  signal  current,  i.  e.. 


n  =~\l‘ 


2  x  16  x  10"20  x  5  x  106  i 
1.265  x  10“^  V i  milliamperes. 


(F-14) 


Thus,  rms  noise  current  may  be  determined  for  the  main  channel  as  a  function 
of  film  transmission,  and  for  the  cancellation  channel  as  a  fixed  value. 


n  si.  265  x 
m 


10  "6  ~\Jv.  70S  x  10" 6  t 


n 


c 


1.  06  x  10“9  V T  amperes 
1.  06  x  10"9  V 0. 05 
2.37  x  10"10  amperes. 


(F-15) 

(F-16) 


Main  channel  signal-to -noise  ratio  on  highlights  (60  percent  film  transmission) 
may  be  determined  at  this  point. 


0.7  x  10~6  x  0.6 
0.06  x  10"9  V 0.6 


510  to  1 


=  54.  1  decibels.  (F-17) 

This  is  the  photomultiplier  output  signal -to -noise  ratio  without  phosphor  can¬ 
cellation.  We  continue  with  the  analysis  to  determine  signal  and  noise  char¬ 
acteristics  after  logging  and  diffe-encing  the  two  channels. 

Note  that  the  constant  k  appears  as  a  factor  in  both  Equations  F~9  and 
F-10.  This  factor  will  assume  a  definite  value  in  any  actual  scanner,  de¬ 
pending  upon  the  amount  of  linear  amplification  prior  to  logging,  and  upon 
current  units  selected.  It  is  useful  to  determine  the  value  of  k  that  results 
in  unity  output  signal  excursion  as  film  transmission  varies  from  its  minimum 
to  maximum  levels.  The  results  so  obtained  are  useful  regardless  of  the 
amount  of  linear  amplification  and  the  log  characteristic  utilized.  The  follow¬ 
ing  discussion  assumes  mathematical  logging,  and  any  logging  device  utilized 
should  approximate  this  characteristic  closely  over  its  operating  range. 

To  ascertain  k  for  film  transmission  variations  between  one  percent  and 
60  percent,  we  set 

k  [(log  i-m  max  -  log  irn  rnin)  -  (log  ic  max  -  log  ic  mm)]  =  (F-If 


F-10 


(F-19) 


But  since  ic  is  invariant,  it  suffices  to  write 

=  1, 


or 


log  — 


m  max 


hn  min 


k  = 


log 


m  max 


m  mm 


log 


7 

T 


•  =  0. 244 


max 


{F  -20) 


Setting  signal  output  to  zero  at  im  =  im  rnm 
signal  output  (Sout)  is  written  as 


,  the  expression  for  normalized 


S  .  =  0.244 
out 


[-T7 


-  log 


xm  min 


(F-21) 


The  expression  for  noise  output  (Nout)  at  the  same  ij^  is 

1/2 


Nout  *  °-244 


fel 


2  I 


(F-22) 


Equations  F-21  and  F-22  may  be  written  in  terms  of  the  film  transmission, 
t  ,  by  substituting  current  equivalents  from  Equations  F-12,  F-13,  F-15,  and 
F-16.  The  resulting  expressions  for  signal  and  noise  output  are 


Sout  =  0.244  log  100  t 


and 


N  ,  =  0.244  x  10 
out 


>VirL+ 


44.8 


(F-23) 


(F-24) 


It  is  now  a  simple  matter  to  plot  equivalent  output  signal -to -noise  ratio 
(R)  in  decibels  as  a  function  of  r  : 


db  =  20  log1Q  - L 

Nou 


'out 

=  72,24  -  10  log10 


2.24 


+  44.8 


(F  -25) 


(F-  ZC) 


F-ll 


Plots  of  output  signal  as  veil  as  output  signal -to -noise  ratio  as  functions  of  r 
are  included  in  Figure  F  -4  for  the  scanner  examined  in  this  section.  This 
figure  is  also  very  useful  in  determining  signal -to -noise  ratio  for  output  sig¬ 
nals  smaller  than  the  full  range.  Let  (S/N)p  represent  signal -to -noise  ratio 
in  decibels  for  a  signal  covering  P%  of  the  output  range.  Then 

-  40  +  20  log  !Q  P  (F-27) 

100 

For  example,  a  signal  covering  1  percent  of  the  output  range  (about  the  small¬ 
est  discernible,  change  in  a  typical  display)  would  result  in  signal -to -noise 
ratios  40  db  less  than  those  of  Figure  F-4. 


FILM  TRANSMISSION,  T 

Figure  F-4  Output  and  Signal -to -Noise  Ratio  of  a  Typical  High  Resolution 
Flying  Spot  Scanner  for  High  Signal-to-Noise  Performance 
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APPENDIX  G 


DETAILS  OF  GLASS  DELAY  LINE  CORRELATOR 


1.  Bit  Rates  and  Resolution 


A  typical  glass  delay  line  and  transducer  which  is  available  from  Corning, 
and  which  has  been  demonstrated  to  Philco,  has  a  center  frequency  of  15  mega¬ 
cycles  and  a  bandwidth  of  7  to  10  megacycles.  In  systems  which  read  out  a  true 
reproduction  of  the  amplitude  modulated  carrier,  an  additional  limiting  factor 
on  the  resolution  is  imposed  by  the  effect  of  the  slit  used  for  read-out.  A  single 
slit  represents  a  different  fraction  of  a  wavelength  for  the  higher  frequency 
sidebands  than  for  the  carrier  or  the  lower  sidebands.  Present  single  slit 
read-cut  uses  a  slit  about  one  third  of  a  carrier  wavelength  wide. 

A  10  megabit  data  rate  implies  an  ability  to  reproduce  a  5  megacycle 
square  wave  with  reasonable  fidelity.  If  a  video  3db  bandwidth  of  5  me  is 
available,  a  train  of  binary  input  signals,  and  their  corresponding  propagated 
waves  are  represented  in  Figure  G-l. 

The  5  megacycle  bandwidth  restriction  affects  the  ability  of  each  bit 
representation  to  be  independent  of  its  predecessor.  For  example,  the  actual 
representation  of  zero  is  different  in  the  eighth  bit  position  in  the  sample  train 
than  it  is  in  the  second. 

Several  ways  of  improving  the  resolution  seem  possible.  A  single  sideband 
driver  might  be  designed  to  make  better  use  of  the  available  device  bandwidth, 
or  the  pole  pattern  of  the  driving  amplifier  could  be  arranged  to  provide  a  wider 
overall  characteristic. 

A  third  possibility  for  improvement  exists  in  the  device  itself.  The  curve 
of  light  transmission  versus  stress  in  the  glass  bar  is  nonlinear.  In  Figure  G-l 
the  transfer  characteristic  is  represented.  Ordinarily,  a  bias  plate  is  added 
in  the  optical  path  to  place  the  operating  point  at  50%  transmission  where  best 
linearity  is  obtained.  In  one  of  the  proposed  systems,  however,  the  appropriate 
operating  point  is  around  zero  transmission  in  order  to  produce  rectification 
and  therefore  off-on  transmission. 

The  effect  of  the  biasing  on  the  sample  pulse  train  is  represented  in 
Figure  G-l.  The  nonlinearity  of  the  transfer  characteristic  has  actually  im¬ 
proved  the  ability  to  distinguish  "0"  from  "1".  If  it  were  further  possible  to 
drive  hard  enough  to  enter  the  nonlinear  portion  of  the  curve  at  the  high  trans¬ 
mission  end  a  further  improvement  could  be  obtained. 
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Figure  G-l  Wave  Propagation  in  Glass  Delay  Line 


For  a  system  utilizing  slit  read-out  for  retention  of  the  carrier  information*, 
over  drive  of  the  line  might  provide  a  corresponding  improvement.  In  fact, 
the  bias  plate  might  be  more  judiciously  chosen  to  adjust  the  operating  point 
to  effect  double  clipping  of  the  envelope  information.  The  ability  to  double 
clip  will  also  be  improved  by  using  a  wider  glass  slab  and  higher  driving  power. 

In  summary, there  is  some  improvement  in  definition  to  be  had  by  special 
driving  and  operating  techniques.  Even  without  these  however,  it  is  possible 
to  take  the  poor  definition  into  account  in  the  design  of  the  mask  sets  by 
duplicating  it  in  the  input  data  used  for  mask  evolution. 

2.  Drive  Circuitry 

The  provision  of  adequate  drive  circuits  for  the  transducer  is  a  problem  of 
some  difficulty.  About  20  watts  of  power  must  be  delivered  to  a  load  of  approxi¬ 
mately  4  ohms  in  parallel  with  4,  000  pf  at  a  carrier  frequency  of  15  megacycles. 
In  addition,  some  bandwidth  compensation  should  be  attempted  which  will  require 
even  more  power  at  the  sidebands.  A  typical  transducer  bandwidth  at  15  mega¬ 
cycles  is  7  to  10  me.  Operating  the  transducer  at  a  harmonic  will  not  increase 
the  bandwidth  but  may  make  it  easier  to  design  compensation  circuits  at  the 
reduced  bandwidth  percentage. 

For  a  feasibility  study  a  vacuum  tube  driver  provides  the  easiest  solution, 
although  a  solid  state  driver  developed  by  Wiley  Electronics  at  a  cost  of  $4800 
can  probably  be  obtained  through  Corning.  It  incorporates  no  compensation, 
however,  and  it  seems  more  desirable  to  attempt  a  vacuum  tube  design  for 
our  purposes. 

A  high  power  pulsed  oscillator*  is  available  for  initial  measurements  on 
the  delay  line. 

3.  Analog  and  Binary  Light  Modulation 

Most  of  the  experimental  work  with  the  optically  tapped  glass  delay  line 
has  utilized  linear  operation.  The  optical  set-up  is  as  shown  in  Figure  4-6 
The  transfer  characteristic  is  most  linear  at  the  50%  transmission  point;  the 
function  of  the  quarter  wave  bias  plate  is  to  set  operation  around  this  point. 

A  slit  of  less  than  approximately  X./3  is  used  to  provide  the  read-out  which  is  an 
accurate  reproduction  of  the  stress  wave  propogated  in  the  glass,  i.  e. ,  an 
amplitude  modulated  carrier.  Recovery  of  the  envelope  alone  then  requires  a 
normal  RF  detector. 


*  Arenberg  PG650C  -  cost  $1200 
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The  choice  of  the  50%  bias  point  is  made  for  two  reasons:  this  part  of 
the  transfer  characteristic  is  most  linear  and  this  is  also  the  point  of  maximum 
sensitivity. 

For  binary  operation  the  same  set-up  can  be  used  with  subsequent  detection 
of  the  RF  signal.  If  the  glass  delay  line  is  to  be  used  for  on-off  light  modulation, 
however,  it  becomes  necessary  to  produce  this  rectification  in  the  device.  The 
operating  point  can  be  shifted  to  the  non-linear  portions  of  the  curve. 

4.  Photosensor  Collection  Systems 


A  practical  problem  exists  in  collecting  the  light  spread  over  a  15"  length 
in  order  to  properly  illuminate  a  1/2"  photomultiplier  input.  The  neatest 
solution  is  undoubtedly  a  fiber  optics  assembly  to  provide  the  necessary  change 
and  the  mechanical  freedom  necessary  for  physically  arranging  100  viewing 
photomultipliers  (one  per  class).  Each  unit  of  the  assembly  would  consist  of 
a  15"  long  array,  a  fraction  of  an  inch  high,  composed  of  uniformly  distributed 
fibers  at  one  end  which  are  gathered  into  a  round  or  square  cross  section 
bundle  at  the  other  end. 

Corning  utilized  an  array  of  6  units,  0.  6"  high  by  30  mils  wide  ending  in 
circular  cross  sections  for  simultaneous  6-slit  readout.  The  assembly  was 
furnished  by  Bausch  and  Lomb  for  about  $200.  This  gives  some  indication  of 
cost  and  feasibility  for  the  kind  of  assembly  under  discussion. 

5.  Homogeneity  of  Delay  Line 

Discussion  with  Coming's  engineers  indicates  that  good  uniformity  in  the 
delay  line  can  be  expected  throughout  the  individual  unit.  Also,  the  temperature 
coefficient  of  expansion  is  small  enough  (80  ppm)  to  not  cause  difficulty.  Some 
problem  may  be  experienced  in  matching  one  delay  line  against  another.  Plots 
of  read-out  signal  amplitude  over  the  length  of  the  line  shows  less  than  3  db 
variation  (using  slit  read-out)  and  verify  the  uniformity  of  the  line.  The  uniformit 
on  the  output  face  in  the  direction  normal  to  the  length  is  not  as  good,  however, 
due  to  the  radiation  pattern  of  the  transducer.  In  addition,  since  the  transducer 
covers  only  about  80%  of  the  cross  section  of  the  glass,  the  edges  of  the  line 
are  not  usable  for  read-out.  A  plot  of  read-out  photomultiplier  current  shows 
a  variation  of  about  _+  30  to  _+  50%  across  the  usable  width.  It  may  be  necessary 
to  provide  a  compensating  mask  or  to  arrange  the  optical  system  in  such  a 
way  that  each  mask  pair  views  the  whole  delay  line  face.  There  appears  to  be 
sufficient  symmetry  to  count  on  a  division  of  the  face  along  the  centerline  with 
the  top  half  for  positive  masks  and  the  bottom  half  for  the  negative  masks  if 
the  system  requires  it. 
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APPENDIX  H 


STATISTICAL  METHODS  FOR  PATTERN  CLASSIFICATION  I 


To  make  this  report  as  self-contained  as  possible,  this  appendix  presents 
a  number  of  items  related  to  the  tutorial  discussion  on  statistical  methods  for 
pattern  classification. 

1.  The  Likelihood  Ratio 


For  the  case  of  two  groups,  when  the  distributions  of  the  populations 
from  which  samples  are  obtained  are  completely  specified,  Welch*  showed 
in  1939  that  a  general  discriminant  function  is  the  likelihood  ratio  of  the  two 
hypotheses.  Suppose  a  number  of  measurements  are  made  on  a  pattern  and, 
on  the  basis  of  the  measurements, 

x  =  (xl»  x2>  •  •  •  >  *N), 

it  is  desired  to  classify  the  pattern  into  one  of  the  two  groups  to  which  it  can 
possibly  belong.  We  can  think  of  the  universe  of  patterns  as  an  N -dimensional 
space  and  the  task  of  classification  as  one  of  dividing  this  N-dimensional  space 
into  two  mutually  exclusive  regions,  Rj  and  R2,  such  that  when  a  particular 
measurement  x  falls  in  Rp  the  pattern  is  listed  under  Group  1  and  when  x 
falls  in  R2,  the  pattern  is  listed  under  Group  2. 

Let  the  proportion  of  the  two  groups  in  the  universe  from  which  the  pat¬ 
terns  are  obtained  be  q^iq-,,  where  q^  +  q2  =  1.  If  fj  (x)  and  f2  (x)  repre¬ 
sent  the  probability  density  functions  for  the  two  populations,  then  the  prob¬ 
ability  of  making  an  error  in  the  classification  of  patterns  into  the  two  groups 
is 


^  (x)  dx  + 


q2  f2(x)dx, 


2  ~- 


(H-l) 


where  dx  is  the  element  of  volume  (dx^  .  dx2-  .  .  dx^).  Rewriting  the  ex¬ 
pression  in  H- 1  as 


^2  + 


f 

J 


(qlfl 


q2f2)  dx 


(H-2) 


*  Welch,  B.  L.,  "Note  on  Discriminant  Functions,  "  Biometrika,  Vol.  31, 
pp.  218-220,  1939.  '  ' 
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it  is  easy  to  see  that  the  probability  of  misclassification  is  minimized  by  put¬ 
ting  all  those  points  in  the  decision  space  for  which  q^  f  <  q.,  f ^  into  the 
Region  R2.  Thus  the  regions  which  give  the  minimum  value  for  the  probability 
of  misclassification  are  defined  in  terms  of  the  likelihood  ratio  (fj)/^). 


fl  *2 


From  H-3  we  see  that  when  and  ^  are  known  and  fixed,  the  regions 
Rj  and  R2  are  separated  by  a  boundary  along  which  the  likelihood  ratio  has  a 
constant  value.  Here  points  on  the  boundary  have  been  arbitrarily  assigned 
to  Rj, 


2.  Some  Decision  Criteria 


Situations  in  which  different  misclassifications  have  different  conse¬ 
quences  may  arise.  Such  Situations  are  formulated  in  terms  of  Wald's  theory 
of  decision  functions*  which  he  presented  in  1939  for  finite  and  infinite  al¬ 
ternative.  In  1945,  Von  Mises**  gave  the  solution  which  minimizes  the 
maximum  error  of  classification  for  the  case  of  finite  alternatives.  Further 
results  and  applications  of  decision  theory  to  classification  problems  were 
given  in  a  series  of  papers  by  Rao.  *** 

Let  C21  be  the  loss  incurred  when  a  pattern  from  Group  1  is  assigned 
to  Group  2  and  C12  the  loss  resulting  in  assigning  a  pattern  from  Group  2  to 
Group  1.  Then  the  expected  loss  is 

ql  C21  J  fl  '  dx  +  92  Ci-2  /  '2  d%  (H-4) 


*  Wald,  A. ,  "Contributions  to  the  Theory  of  Statistical  Estimation  and 
Testing  Hypotheses, "  Ann.  Math,  Stat. ,  Vol.  10,  pp.  299-326,  1939. 

**  Von  Mises,  R. ,  "On  the  Glassification  of  Observation  Data  into  Distinct 
Groups,"  Ann.  Math.  Stat..  Vcl.  16,  pp.  68-73,  1945. 

***  Rao,  C.  R.  ,  "Statistical  Inference  Applied  to  Classificatory  Problems,  " 
Sankhva.  Vol.  10,  pp.  229-256,  1950;  Vol.  11,  pp.  107-116,  1951; 

Vol.  12,  pp.  229-246,  1952-1953. 
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and  the  solution  which  minimizes  this  is 


Ik.  3,  q2  C12- 

f 2  ql  C21 

> 

f  1  <  q2  C12 
f2  ql  C21 


(H 


_  c\ 
'  > 


Note  that  only  the  value  of  the  threshold  t,  against  which  the  likelihood  ratio  is 
compared,  has  changed.  The  above  value  of  the  threshold  resulted  from  using 
Bayes  criterion,  viz. ,  minimization  of  the  expected  loss  of  classification. 
Here  we  had  tacitly  assumed  that  no  gain  resulted  from  a  correct  decision; 
the  general  result  using  a  Bayes  criterion  gives 


t 


q2  (C12  "  C22^ 
ql  (C21  ‘  Cll} 


(H-6) 


For  a  priori  probabilities  and  costs  for  which  t  becomes  1  we  obtain  the 
maximum  -  likelihood  criterion.  When  a  priori  probabilities  are  not  known, 
the  principle  of  maximum  likelihood  leads  to  a  Bayes  procedure  with  equal  a 
priori  probabilities  assigned  to  each  alternative.  Another  possibility  is  the 
minimax  decision  rule*  which  is  the  Bayes  rule  relative  to  the  a  priori  distri¬ 
bution  for  which  the  expected  loss  is  a  maximum.  In  all  these  cases  the  deci¬ 
sion  rule  is 


3.  Likelihood  Ratio  for  Multivariate  Normal  Distributions 


The  joint  probability  density  function  of  N  gaussian  real  random  variables 
x-  with  mean  m^  and  variances  v--  is 


*  Anderson,  T.  W.,  "An  Introduction  to  Multivariate  Statistical  Analysis,  " 
New  York,  John  Wiley,  1958.  Chapter  6. 
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where  jv|jj  is  the  cofactor  of  the  elements  v^  in  the  determinant  |v|  of  the 
covariance  matrix 


Y  = 


11 


21 


'12'  •  ‘  • 
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'  <5  0  >  *  •  •  *  V  OXT 

ZZ  4.J.N 


VN1  VN2’  •  •  •  ’  VNN 


in  which 

vij  =  E  (jxi  "  m0  <xj  “  mjO  ' 


Denoting  the  vector  of  means  by 

M  =  (mlf  m2,  ....  mN), 

the  joint  probability  density  in  matrix  notation  becomes 

f(x)  =  - ! -  exp  I-.I  (X  -  M)'  V'1  (X  -  M) 

(2t>N/2  |v  ! 1/2  L  2 

Now  if  N  measurements  are  made  on  a  pattern  which  can  belong  to  one  of  two 
groups  in  which  (a)  the  measurements  are  normally  distributed  -  no  assumptio 
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of  statistical  independence  of  measurements  is  made  and  (b)  the  covariance 
matrices  for  the  two  groups  are  equal,  then  the  likelihood  ratio  is  the  ratio  of 
two  multivariate  normal  density  functions  which  differ  only  in  their  mean 
vectors: 


fl(x)  exp  {-  -L  (x-M(1))'  V"1  (x-M(1))> 


f?  (*) 


X2WW  exp  {-  —  (x  -  M(2))  V"1  (x  -  M(Z))} 
2 


=  exp 


(4[» 


M^)  V"1  (x  -  M(1)) 


(x  -  M(Z))  V”1  (x  -  J  . 


The  condition  of  H-7  may  now  be  applied  to  determine  the  set  of 
x's  which  should  be  classified  into  Group  1  and  Group  2,  respectively.  Taking 
the  logarithm  of  the  last  expression  gives,  for  the  boundary  between  Rj  and 
R2: 


-  J_ 
2 


(x  -  M(1) )  V"1  (x  -  M'1'  -  (x  -  M<2' )  V1  (x  -  M(2) ) 
which,  after  rearranging  terms,  becomes 


=  log  t, 


X1  V_1.(M(1)  -  )  =  log  t  +  —  +  M(2))  v"1  (M(1)  -  (H-8) 

2 

The  terms  on  the  right-hand  side  of  the  above  expression  can  be  represented 
by  a  constant  C,  so  that  the  regions  R.  j  and  R2  in  the  N-dimensional  space 
are  separated  by  a  hyperplane  whose  equation  is 


x‘  V"1  (M*1)  -  M(Z))  =  C, 
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where 


and 


vJ1  ,  (i,  j  s  1,  2,  ...  N)  are  elements  of  V-* 


v  -  ",w>- 

When  the  covariance  matrices  for  the  two  populations  are  not  assumed  to 
be  equal,  the  likelihood  ratio  is 


f^x)  !V2|1/2  exp(  -  y(x  -  M(1))  V1-1(*-M(1>)} 


f  ?(x) 


iVj  j 1/2  exp  {  -  i-  (x  -  M(Z) )  V2-1  (x  -M{2))} 


1 1  V  -  J  *  m  I  '2 


The  logarithm  of  the  likelihood  ratio,  after  rearranging  terms,  is 
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The  surface  of  constant  likelihood  ratio  is  then  defined  by 
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=  constant. 
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where  Vj^  and  are  elements  of  *  and  V2  \  respectively.  The 

last  two  expressions  may  be  written  in  the  form: 
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4.  Likelihood  Ratio  when  the  Variables  are  Binary 


Let  X  =  (Xp  X^ ,  .  .  .  ,  X^)  ,  Xi  =  0  or  1  for  i  =  1,  2,  .  .  .  ,  N, 

denote  the  observables  and  let  f  ^(x)  and  f2  (x)  denote  the  joint  probability 

functions  of  the  Xj  in  Group  1  and  Group  2,  respectively.  In  order  to  com¬ 
pute  the  likelihood  ratio,  it  is  necessary  to  consider  expansions  for  the  prob¬ 
ability  functions.  Any  parametric  representation  of  an  arbitrary  distribution 
of  N  binary  variables  will  in  general  require  (2^  -  1)  independent  parameters. 
A  useful  representation  is  an  orthogonal  expansion  for  joint  probability  functions 
of  binary  variables  as  in  Bahadur*’**.  Define  the  following  parameters  for 
the  two  groups  (g  =  1,  2;  i  =  1,2,...,  N): 


*  Bahadur,  R.R.,  "A  Representation  of  the  Joint  Distribution  of  Responses 
to  n  Dichotomous  Items.  "  USAF  SAM  scries  in  statistics,  Report  No.  59-42, 
Randolph  AFB,  Texas,  1959;  appears  in  Studies  in  Item  Analysis  and  Predictioi 
Ed.  Herbert  Solomon,  Stanford  University  Press,  Stanford,  California,  1961. 

#*  Bahadur,  R.  R.,  "On  Classification  Based  on  Responses  to  n  Dichotomous 

Items, 11  USAF  SAM  series  in  Statistics,  Randolph  AFB,  Texas,  1959;  appears 
in  Studies  in  Item  Analysis  and  Prediction,  Ed.  Herbert  Solomon,  Stanford 
University  Press,  Stanford,  California,  1961. 
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where  Eg  denotes  that  the  expectation  is  taken  with  respect  to  the  probability- 
function  of  Group  g,  g  =  1,  2.  The  rjj^  >  rijk^  ,  .  .  .  ,  and  r-^)  , 


rijk 


(2) 


are  the  "correlation  parameters"  for  Group  1  and  Group  2, 


respectively.  Let  pj^)  (x)  denote  the  probability  functions  of  the  in  the 
two  groups  when  the  are  independently  distributed  but  have  the  same  mean 
values  m-^,  as  they  do  when  fg  (x)  hold.  Then 

N 

<g) 


Pl(gl  W  = 


m. 

l 


X‘  11 

i 


(H-16) 


The  functions 


i  =  1 

fg<x> 

Pl^  (x) 


can  be  expanded  in  the  orthogonal  expansions 


given  by: 


g 

P'(8)jx) 


1  +  \  X  y  y  +  /  1  r  (g)y(g)  y  (g)  y  (g) 

Z_.  ij  i  i  Z_,  ijK  i  j  k 

.  i<j<k...  -  — 


i<j-V 


t  .  .  .  +  >  <*>  y(8)y  (8) 

1Z,  ...N  yl  y2 


(g) 


N 


(H- 17) 


Denoting  the  right  side  of  Equation  H-17  by  h^  (x). 


M) 


(g) 


f  (X)  =  Pj'6'  (x)  •  h'6'  (x) 


(H-18) 


The  likelihood  ratio  fj  (xj/f^  (x)  is  then  given  by 


L  (x)  = 


.  (D  .  .  .  (1)  ,  , 

Px  (x)  h  (x) 
p}2)  (x)  h<2>  (x) 


or 


tn 

h(1)] 

xi  j 

I'H 

L*ijWy1<l>y.(l>  +  ...) 

(5 

h<21] 

l  l 

[2-»l21] 

(!-*l  j] 

k»; 

v  X" 

L  -ifVi'2’  y,l2>  ■■■) 
Cl  J 

(H-19) 

By  dropping  terms  of  h^  (x),  different  approximations  can  be 
obtained  for  f  g  (x)  from  Equation  H-18,  and  the  result  will  be  a  legitimate 
probability  function  as  long  as  the  terms  of  h  '&)  (x)  which  are  retained  are 
non-negative  fop  all  x.  The  first-order  approximation  to  fg  (x)  is  p^S)(x). 
The  second-order  approximation  is 


1  + 


(H-20) 


and  so  on. 


When  first-order  approximations  to  f  j  (x)  and  £%  (x)  are  used. 


H-9 


the  logarithm  of  the  likelihood  ratio  is  easily  seen  to  be 
N 

Log  L  (x)  =  ^  (a£  x  L  +  ci), 

i=  1 

where 

mX*)  {l-m.(^  ) 


i.  =  log 


m.<2)  (l-m.(D) 


(H-21) 


(H-22) 


and 


=  l°g 


/i  W  x 
C1  -  ) 

(1  -  mX2)  ) 


(H-23) 


When  the  second-order  approximation  of  Equation  H-20  is  used  for 
f  j(x)  and  f  2  (x),  the  logarithm  of  the  likelihood  ratio  is 


Log  L(x)  =  ^  (ai  x  i  +  Cj  )  +  log  [l  +  ^  r.X1)  yX1)  yj'1^  j 

i=  1  if  J  .. 

-  log  J\  +  r  (2)  y(2)  y  (2)  j  J  (H-24) 


where  a^  and  c  j  are  as  defined  in  Equations  H-22  and  H-23  and  the  other 
terms  are  as  defined  in  Equation  H-15. 


If  a  digital  computer  is  being  used  for  computation,  the  expression 
in  Equation  H-24  poses  no  major  problem.  If  the  "correlation  parameters" 
are  small  enough  so  that  one  can  use  the  approximation  log  (1+0  )  ~  0  , 
then  a  function  which  is  somewhat  easier  to  implement  than  that  of 
Equation  H-24  is  obtained,  viz. , 


N 


Log  L(x) 


i=  1 


+  X  Uij(2>  >  xi  *j  +  J  ci  .  (H-25) 


i<j 


i=  1 


H-10 


where 


ij 


_ 


ymj(g)  (l-mj(g)  )  mj(g)  (l-mj(g)  ) 


g  =  1,2. 


5.  The  Anderson- Bahadur  Plane  Boundary 


The  recent  paper  by  Anderson  and  Bahadur*  presents  the  class  of 
best  (admissible)  linear  procedures  for  the  case  of  two  multivariate 
normal  distributions  which  differ  in  mean  vectors  and  have  unequal 
covariance  matrices.  For  the  case  of  arbitrary  distributions,  ij.  e. , 
distributions  not  necessarily  normal,  the  procedure  finds  a  direction  of 
projection  for  the  two  samples  and  computes  a  linear  function  for  which 
the  ratio  of  the  difference  between  the  means  of  the  projected  samples  to 
the  sum  of  the  standard  deviations  of  the  projected  samples  is  maximized. 
If  the  direction  of  projection  is  given  by  the  column  vector  A  =  (aj  ,  a£  , 

. . . ,  a  jj  )  and  M^)  and  M^)  are  the  mean  vectors  for  Group  1  and  Group 
2,  with  Vj  and  V2  as  the  corresponding  covariance  matrices,  the  minimax 
Anders  on- Bahadur  procedure  finds  the  vector  A  which  maximizes  the  ratio 


A'  (M(1)  -  M(2)  ) _ 

(A'VjA)172  +  (A1  V2A)1/2 


(K-26) 


The  resulting  A  is  given  by 


A  = 


tVj  +  (l-t)V2J  <M{1)  -M(2)  ), 


where  t  is  a  scalar  0<  t  <  1  given  by 


t*V,  +  (l-t*5)  V, 


A  =  0. 


(H-27) 


(H-28) 


Starting  with  an  initial  value  for  t,  A  is  obtained  from  Equation  H-27;  these 
values  for  t  and  A  are  then  used  to  see  if  Equation  H-28  is  satisfied.  In 
this  manner  A  is  solved  for  iteratively.  In  terms  of  the  vector  of  measure¬ 
ments  X  =  (xj,  x2,  .  ..,  Xjj),  the  plane  boundary  for  classification  is 
given  by 


A'X  =  C  ,  (H-29) 

*Anderson,  T.W.  and  Bahadur,  R.R.,  "Classification  Into  Two  Multivariate 
Normal  Distributions  with  Different  Covariance  Matrices,  "  Ann.  Math.  Stat. , 
Vol.  33,  No.  2,  pp.  42-431,  June  1962;presented  at  I.  M.  S.  Meeting,  Januaryl960. 


H-ll 


where 


(A'V,  A)1/2  .  A'M(1)  +  (A'V.A)1''2  A>M<2) 

C  =  _ “ _ _ _ _ _ i - 

(A'V1  A)V2  +  (A'V2A)1/2 


(H-30) 


In  computing  the  above  expressions  when  we  have  n^  samples  from 
Group  1  and  n2  samples  from  Group  2,  the  quantities  M  ^  and  M^ 
would  be  replaced  by  and  x^2)  ,  respectively,  and  Vj  and  VI,  by 

Sj  and  S2  ,  respectively,  where 


xte)=  (x  j(8)  ,  x^g)  f  .77,  Xj^8)  ),  g  =  1,2, 


(H-31) 


x.(g)  =  _L. 

1  n 

g 


n^r 

£  x.<*>,  i-1.  2,  ...,N;g-  1,2, 


r=  1 


(H-32) 


and 


S  =  fs.^l 

g  L  ij  J 


(H-33) 


where 
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*J  (a„-D 


(*.(8,-i(8)  )<*.<gUte)  !. 

i  r  i  j  r  j 


r=  1 


(H-34) 


/  t  J$c  *  - - 

6.  Multidimensional  Scatter  * 

Let  us  rewrite  the  expression  of  Equation  H-34,  for  the  elements 
of  the  sample  covariance  matrix,  as  follows: 

(n g  -1)  si/g*  =  uij^  =  ^  (xjW-  )  .  (H-35) 

— - -  r=  1 

*  Wilks,  S.S. ,  "Multidimensional  Statistical  Scatter,  11  Contributions  to 
Probability  and  Statistics  in  Honor  of  Harold  Hottelling,  Stanford  Univ. 
Press,  I960. 


**  Wilks,  S.S. ,  Mathematical  Statistics,  New  York,  John  Wiley  and  Sons, 

1962. 


H-12 


Wilks  uses  the  name  internal  scatter  matix  for  the  matrix  ']  anc* 

he  calls  the  determinant  j  j'8)|  the  internal  scatter  of  the  sample 
g,  g  =  1,  2.  It  is  instructive  to  consider  the  origin  of  this  terminology. 

Let 


xr  =  (x 


lr*  2r* 


xNr  ^  *  r  =  1,2,  ....  n 


be  a  sample  of  size  n  from  an  N-  dimensional  distribution,  and  let 
(xjq,  *2Q,  •••»  )  be  some  pivotal  point  about  which  we  wish  to  define 

the  scatter  of  the  sample.  Define  the  matrix 


<xll  -  x10> . <XN1  -  XN0>' 

•  • 

•  • 

Jx  ln  X  10)  ,  ....  (xNn  -  ) 


(H-36) 


Then  the  scatter  matrix  is 


H'H 


n  2  n 

.  £,  <*lr  -*10'  Nr  -  *N0  > 


r=  1 


r=  1 


t  *  U  •  f 


£  (*Nr  -  *  NO 


r=  1 


r=  1 


(H-37) 


The  scatter  denoted  by  _.S 

-  N  xol n 


is  defined  as  the  determinant  of  H'H,  i.  e. , 


S 

14  x0,,n 


H'  H 


(H-38) 


The  geometrical  interpretation  of  the  scatter  is  as  follows.  The 
sample  of  size  n  and  the  pivotal  point  can  be  represented  as  n+  1  points 
in  an  N-dimensional  Euclidean  space.  Consider  the  N+ 1  points  obtained 
by  using  N  out  of  the  n  sample  points  and  the  pivotal  point.  There  are 
N+  1  ways  of  choosing  an  additional  point  which,  together  with  the  N+  1 
points  already  picked,  will  form  an  N-dimensional  parallelotope.  Note 
that  N+  1  different  parallelotope s  can  result  but  they  will  all  have  the  same 
absolute  value  for  their  N-dimensional  volumes.  This  absolute  value  is 


H-13 


D 


called  the  N-dimensional  content  determined  by  (x  lr ,  x2r  ,  —  ,  xNr  )* 
r  =  1,  2,  . ,  •  |  n  and  (x  jq  »  ^20  *  r  *  • »  there  are 

|”  |  different  ways  of  choosing  N  out  of  n  sample  points.  The  scatter 

defined  in  Equation  H-38  is  equal  to  the  sum  of  the  squares  of  the  N- 
dimensional  contents  determined  by  (x^Q ,  ....  x  ^  )  and  each  of  the 

j^J  different  possible  choices  of  (xjr,  x2r’  ^Nr  r  = 

Now, 

n 

Z(x.  -X.  )  (x.  -X  .  ) 

ir  10  jr  jo' 

r=  1 

=  uy  +  n  (3c.-xio)  (5c.-xjo).  (H-39) 
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■  I 


(x.  -X  .  )  fx.  ■ 
1  ir  i  v  jr 


x.  )  +  n  (x.  -x.  )  (x.  -x.  ) 
J  '  i  xoy  jr  jo' 
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Thus  Equation  H-38  can  be  written  as 

2 


NSx_,  n 


uu  +  n(x1-x10)  , 


“lN  +  n  <xi  _xio)  (xN  "XN0  * 


UN1  +  n  {XN  'Xm]  {X1  -X10) . W  n  (XN  'XN0  } 


(H-40) 

If  [uij'j  is  a  positive  definite  matrix,  the  above  equation  can  be 

,  (H-41> 


written  as 


N  xQ,  n. 


N 

luiil 

• 

1+  n  £ 

UIJ  (xrxio  )  (Xj  -xJO  ) 

L  i,j=l 

- 

where  u1J  are  elements  of  the  inverse  matrix  [uij]  *  •  The  matrix 
^  u_  j  will  be  positive  definite  if  and  only  if  the  n  sample  points  do  not 

lie  in  a  flat  space  of  less  than  N  dimensions.  From  Equation  H-41  it 
follows  that  the  minimum  value  of  the  scatter  XTS  ,  occurs  when 

IM  Xq  »  u  » 

the  pivotal  point  (x^0,  x^q,  *  *  • »  xno  )  is  chosen  as  the  vector  of  sample 
means  ( Xj ,  x^  ,  . . . ,  x  ^ ),  This  minimum  value  is  then 


-1 


N  x,  n 


uij 


(H-42) 


H-14 


The  determinant 
the  matrix  [uij] 


| u  •  j |  is  called  the  internal  scatter  of  the  sample,  and 
is  called  the  internal  scatter  matrix. 


7.  Linear  Discriminant  Analysis  Based  on  Scatter- Two  Groups 


Let 


and 


XNr  T1  ~  1’2’  "  ’ '  nl 


(2).  -  ,  , 

xMr ^  r  2"  1,Z>  '*  •»  n; 


be  two  samples  of  size  nj  and  n2  , with  n  j  >  N  and  n2  >  N.  Let  xj^and 
x[2^  be,  respectively,  the  means  of  the  x , ,  i=  1,2,  N.  for  the  two 

samples.  Let  |ujj^j  and  [ui/^]  denote,  respectively,  the  internal 

scatter  matrices  of  the  two  samples  with  u ,  g  =  1,2,  as  defined 
in  Equation  H-35.  Assume  also  that  the  internal  scatter  matrices  are 
nonsingular.  .JBy  pooling  the  two  samples,  one  obtains  a  grand  sample  for 
which  the  sample  means  are  denoted  by  X£  and  the  internal  scatter  matrix 
by  [u  —  1  .  Now  define  the  within- samples  scatter  matrix  W  =  [wij]  by 


+ 


u 


(2)1 


(H-43) 


Note  that  if  we  assumed  the  covariance  matrices  of  the  two  groups  to  be 
equal,  the  sample  estimate  of  the  common  covarij  nee  matrix  would  be 
given  by  [  8  ij]  »  where 


(nj+  n2-2)  sjj  = 


£  (-pupJMx.pLx/D) 

r=  1 
n2 


+  Y  (x.(2)-x.(2))(x.(2)-x(2))=  w4i 
L->  '  ir  l  '  '  jr  j  '  ij 


(H-44) 


r=  1 


H-15 


Now  the  internal  scatter  matrix  of  the  grand  sample  whose  mean  is  x^  is  . 

£  u. ;  j  ,  where 

u..  =  u.f1)  +  u..(2)  +  n,{5  <1)-  X.  ) _  X.)  +n2  x.  )  (xj2)-x.  ) 

ij  ij  ij  1 '  i  i  J  J  2  l  x  J  J 


(H-45) 


so  that  the  between- samples  scatter  matrix  B 
given  by 


l\] 


has  its  elements 


b. .  =  u  - .  -  w  • .  . 

ij  xj  xj  » 


=  n  1  (k^1*  -x.  )  )  +  n2(x.*2*-xi)  (xj2^-x^  )\(H-46) 


-2^2-  x.(2>  )  (!.<*>  -x.<2>  ) 

”l+»2  ‘  *  ’  J 


„  ,  .  ..  .  .  /nl”2  ,-(l)_(2),  /»1»2  . 

Therefore,  b ^ j  =  bjbj,  where  °i=\l -  (Xj  -x^  )  =/ -  d^ 

v  n.+  n_  y  n1+n2 


Let  A  =  (a^  ,  a^,  . .  . ,  a  be  an  arbitrary  vector  and  let  the  two  samples 


be  denoted  by  x'^',  g  =  1,2  so  that 


-  (g)  x  (g)  x  (g) 

kll  '  x12  »  • • •»  xln„ 


x  (g)  x  (g>  x  (g 

N1  '  N2  '  *  * ‘ *  N,  n 


Projecting  the  two  samples  onto  a  line  whose  direction  cosines  are 
proportional  to  the  elements  of  the  vector  A  gives 


z.(g)  _  Ai  (g)x(g)  (  g  =  1,2 


(H-47) 


and  Z(1)  =  (Zl(1),  ....  zn(1)  )  and  Z(2)  =  (Z](Z)  ,  ....  'z^2)  ) 

1  2 


H-16 


r*WS 


are  the  two  one -dimensional  samples  obtained  as  a  result  of  the  projection. 

Let  S(1)  and  be  the  means  of  the  two  one-dimensional  samples  of 

z's  and  let  z  be  the  mean  of  the  pooled  sample.  Then,  if  S 

1  z,  ni  +  n2 

is  the  scatter  of  the  grand  sample  obtained  by  pooling  the  two  one-dimensional 
samples  of  z's,  we  can  write 


s- 

1  z,  nj  +n2 


SW  +  SB 


(H-48) 


where  S is  the  within- samples  scatter  of  the  z's,  given  by 


ni 


n2 


SW=  Y.  +  I  (42|-*(2)>2. 

r=l  r=l 

and  S_  is  the  between- samples  scatter  of  the  z's,  given  by 


(H-49) 


SB  =  njfz^1^  -z  +  n2(z^2^  -  z  }' 

=  njn2_  {t(\) 
n  1+  n  2 


(H-50) 


The  approach  of  linear  discriminant  analysis  based  on  scatter  is  to 
determine  the  particular  value  of  A  =  (a^  ,  a^,  ....  )  which  maximizes 

Sg,  the  between- sample  scatter  for  a  fixed  value  of  S^y,  thq,  within- sample 
scatter.  Thus,  the  problem  is  to  maximize  Sg  subject  to  the  constraint 
=  constant. 

Now, 


=  A'BA,  (H-51) 


where  B  is  the  matrix  defined  in  H-46. 


H-17 


Similarly 


Sw  =  A'WA. 


(H-52) 


Using  X  to  denote  a  Lagrange  multiplier,  a  necessary  condition  for 
a  maximum  is  then 


a 

a  a 


S  B  -  X  (Sw  -const) 


0 


or 


2A'B  -  X  ZA'W  =  0 


or 


A'  [  B  -  Xw]  = 


0. 


(H-53) 


For  a  non-zero  value  of  A,  the  determinant  of  the  term  in  brackets  must 
be  zero,  i.  e. , 


(H-54) 


B  -  X  W  =  0 


or 


or 


|  by  -X  Wy|  •  0 


lb-,  b  .  -  X  w, • 


} 


0. 


Then, 


w1J  b.b. 

i  J 


nln2 


nl+n2 


d'  d, 


(H-55) 


where  b^  is  as  defined  in  Equation  H-46  and 


d  = 


H- 18 


and 


-  ef* 


(H-56) 


The  value  of  A  corresponding  to  this  root  l|,  satisfies  the  equation 
If 

V  0»  •»,»„)*.  =0.  (H-57) 

Z-j  ij  *  tj  j 


The  value  of  A  can  be  determined  within  a  constant  of  proportionality 
by  noting  that 

S,=  °1“Z  (I<!>  _J«2t)2=  -l^fdAl2  . 

•l  +  *2  "l  +  »2  J 


and 


Sw  =  A"  W  A  ~  {nj+n^-Z}  A"  S  A, 

where  S  is  sample  estimate  of  the  common  covariance  matrix  for  the  two 
groups.  Then, 

‘~3^a"  [  SB  ”  *  '  const)  1  =  0  gives 


or 


JX- 1  ix  ^ 

— - - Z  (d'A)  .  d'  -  X  .  Z  (nj  +  n2-Z)  .  A'S  =  0 

n£+  n2 


KInZ 


n  „+  n_ 
l  Z 


drA _ _ 


.  d"  =  A'S, 


Since  every  element  of  dr  is  multiplied  by  the  same  term  and  since  we  can 
determine  the  elements  of  A  only  within  a  constant  of  proportionality,  we  set 


LI  nZ 


d'A 


(n  j+  >  X  fn  j  +  n  2~Z  J 


=  constant,  say  K, 


H-19 


(H-58) 


Then,  noting  that  S  is  a  symmetric  matrix,  from  K.  d'  =  A's, 

A  =  K.  S"1  d. 

With  K  =  1  we  get  the  same  direction  as  with  any  other  value  of  K.  Using 
A  =  S  ”  1  d,  the  linear  function 

x'A=  x'  S"1  d  (H-59) 

which  is  obtained  is  Fishers1  linear  discriminant  function.  The  boundary 
for  classification  is  given  by 

x’  S-*  d  =  constant.  (H-60) 


Note  that  maximizing  Sg  while  keeping  constant  is  the  same 
as  maximizing  the  ratio -  or  minimizing  the  ratio - while 


keeping  S^.  constant. 


W 


SB  +  SW 


Alternative  derivations  of  the  above  direction  of  projection  have 
been  given  recently  in  the  pattern  recognition  literature  in  terms  of 
average  Euclidean  distance  between  pairs  of  points  within  groups,  between 
groups,  and  among  pooled  samples.  These  "new”  formulations  are 
trivially  different,  do  not  lead  to  any  new  results,  and  are  more 
cumbersome,  since  the  average  Euclidean  distance  between  pairs  of 
points  does  not  have  the  simple  relationship  to  spread  of  a  sample  as 
does  the  scatter  about  a  pivotal  point  as  formulated  in  multivariate 
statistical  analysis  and  described  above. 

8.  Multiple  Linear  Discriminant  Functions 

When  there  are  G  groups,  G  >  2,  we  can  proceed  to  consider  each 
pair  of  groups  separately.  However,  it  is  possible  to  study  overall 
relationships  of  the  groups  with  a  certain  amount  of  parsimony  by  defining 
within- samples  and  between- samples  scatter  matrices  for  G  Groups.  The 
within- samples  scatter  matrix  now  has  elements 

G  ng 

W  =  Y  Y*  (x.^-"5 c.  te')  (x  x  te* )  (H-61) 

ij  L-,  L,  ir  i  J  r  j 

g-1  r-1 

and  the  between- samples  scatter  matrix  has  elements 
G 

b.  j  =  Y,  ng  (*i g)  ‘  *i  >  <*j(g>  -  *j  >  •  (H-62) 
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The  within- samples  scatter  matrix 


i  wij] 


is  sometimes  called  the  SSW, 


SPW  matrix,  the  terms  referring  to  "sums  of  squares  within"  and  "sums 
of  products  within.  "  Similarly,  the  matrix  [  by  ]  is  sometimes  called 


the  SSB,  SPB  matrix,  these  terms  referring  to 
and  "sums  of  products  between. 


sums  of  squares  between" 


Proceeding  as  in  the  two-group  case,  one  sets  up  the  condition  for 
maximizing  the  between- samples  scatter  of  .the  projected  samples  while 
keeping  the  within- sampled  scatter  constant.  As  in  the  two  group  case, 
this  leads  to  the  condition  that 


W 


-1 


B  -  X  I 


=  0 


(H-63) 


Unlike  the  two-group  case  though,  now  there  will  be  Xj  ,  [  j  =  1, .  .  . ,  min 
(G-l,  n)  ]  roots  which  are  non- zero,  unless  it  so  happens  that  all  the 
sample  group  means  lie  along  the  same  line.  Corresponding  to  each  root 
X  j  will  be  a  vector  Aj  which  is  a  solution  of  the  equation 

(H-64) 
aji  xi  • 

The  samples  may  now  be  represented  in  this  space,  of  reduced  dimensions, 
which  is  referred  to  as  the  "discriminant  space.  "  Classification  in  the 
new  space  can  then  proceed  according  to  procedures  available.  Since  the 
new  variables  Zj  are  weighted  linear  sums  of  the  variables  x ^  ,  even 
when  the  x  j  are  discrete  variables  the  approximation  of  multivariate 
normal  distributions  for  the  new  variables  is  likely  to  prove  useful. 

9.  Euclidean  Distance  as  a  Means  for  Classification  of  Projected 
Samples 

Some  investigators  have  proposed  the  use  of  the  mean- squared 
Euclidean  distance  between  pairs  of  points  after  projection  of  the  samples 
(on  a  subspace  obtained  from  the  original  N-dimensional  space  by  a  linear 
transformation)  as  a  suitable  measure  by  which  to  classify  a  new  sample 
into  one  of  two  groups.  It  is  easily  shown  that  computing  ordinary 
Euclidean  distance  in  the  new  space  does  not  result  in  a  quadratic 
discriminant  function  but  only  in  a  linear  function.  Thus,  the  additional 
computation  needed  to  obtain  Euclidean  distances  can  be  avoided.  (Only 
when  the  coordinates  of  the  new  space  are  suitably  weighted,  as,  for 
example,  when  the  functions  defining  the  discriminant  space  are  weighted 
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Each  A  j  leads  to  a  linear  discriminant  function  Z  j  = 


-I 

i=  1 
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in  a  manner  which  accounts  for  their  relative  importance,  does  the  compu¬ 
tation  of  Euclidean  distances  prove  useful. ) 


Let  Z  =  {z  j  ,  z ^  ,  . . . ,  z  )  be  the  new  sample  after  projection  into 
the  transformed  space.  Let  z  =  (z  .  ...,  zv  (*))»  r  ,  =  1, 

rj  irj  «■»  rj  i 

2,  . . . ,  n  i  be  the  samples  of  Group  1  after  projection  and 


.  <2)  =  (.  (2» 


(2) 


)  r  =  1,  2  . . .  n  be  the  samples  of  Group  2 


lr  *  '  *  K,  r^'  2 

after  projection.  The  average  Euclidean  distance  between  the  new  sample 
and  points  belonging  to  Group  1  is 


nl  K 

I  I 
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'  l  ir^  ' 


(H-65) 
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Similarly,  tlie  average  Euclidean  between  the  new  sample  and  points 
belonging  to  Group  2  is 


n2  K 
2  r2=l  i=l 


K  K  — n  K 

•  Z  “i2+E  **(2)-  I  *i,2>- 1 

i=l  i=l  i=l 


(H-67) 


Taking  the  difference  between  the  two  gives  the  expression 

£  <si2)  -  zi(1)  )~*i  +  J  (^(1)+  ^?(2))  (H-68) 
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as  the  classification  procedure.  This  expression  is  a  linear  function  in 
the  z ^  .  Furthermore,  the  coefficients  of  the  z^  are  simply  the  difference 
between  the  mean  values  of  the  projected  coordinates  in  the  two  groups,  the 
projected  coordinates  having  been  obtained  by  a  linear  transformation  from 
the  variables  x^  .  Thus  no  quadratic  discriminant  function  results. 


*  ## 


10.  Mahalanobis'  D2 


For  two  samples  of  size  tij  and  n2  with  N  characteristics  measured 
on  each  member  of  each  sample 


N  N 


Dn2  =  Y  sJ1  t*/1*  -*i(2))  «*/*>  (H-69J 

j=l 

where  the  matrix  [  si  1  ]  is  the  inverse  of  [s-j  ]  . 

(g)  s 
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and 
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(n  i+  n  2~2)  si  j  =  ^  y  (rjg)-x(g))  (x.(g)  -  xj(g)  )  (H-70) 
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where  [w^j  ]|  is  the  within- samples  scatter  matrix  defined  in  Equation 
H-44.  In  terms  of 


d  = 


with  dj  =  (x^ 


tv(D  .ir(2) 


), 
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=  (nj+  n2-2)  d'  W_1  d. 


(H-71) 


2 

Hottelling's  T  ,  a  generalization  of  Student's  t  for  two  multivariate 


samples,  is  related  to  by 

V  2  . 


nln2 
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(H-72) 


*  Rao,.  C.  R. ,  Advanced  Statistical  Methods  in  Biometric  Research.  New 
York,  John  Wiley,  1952,  Chap.  7. 

**Wilks,  S.S. ,  Mathematical  Statistics.  New  York,  John  Wiley  and  Sons, 
1962,  Chap.  18. 
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To  test  the  hypothesis  that  there  is  no  difference  in  the  mean  values  of  the 
N  characteristics  for  the  two  populations,  the  statistic 
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(H-73) 


is  used.  For  the  case  where  both  samples  are  independently  drawn  from 
identical  N-dimensional  normal  distributions,  i.  e. ,  under  the  null 
hypothesis,  the  statistic  of  Equation  H-73  has  an  F  distribution  with  N 
and  (nj  +  n2-N-l)  degrees  of  freedom. 


For  G  samples,  G  >  2,  of  size  ng,  g  =  1,  2,  ..  .,  G,  Rao's 
ilization  of  Mahalanobis '  D  2  is  given  by 


generalization 
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Equation  H-74  can  be  rewritten  as 
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When  the  total  sample  size 


G 

I, 

g=l 


ng  is  large,  then  D' 


N.G 


as  defined 


in  Equation  H-75  has  a  X  ^  distribution  with  N(G-l)  degrees  of  freedom 
under  the  null  hypothesis  of  no  difference  between  the  mean  values  of  the 
N  characteristics  in  all  the  G  populations.  This  statistic  can  be  used  for 
testing  whether  observations  on  Q  additional  variables  can  be  useful  for 
increasing  the  distance  between  the  two  samples.  Rao  suggests  the  use 
of  (D2^q  ^  ^  )  to  judge  the  significance  of  information  suppliedby 

the  additional  Q  variables.  For  very  large  sample  sizes  this  difference 
has,  approximately,  a  X2  distribution  with  Q(G-l)  degrees  of  freedom. 

As  in  the  case  of  two  groups,  a  step-by-step  screening  of  the  variables 
can  be  done  using  this  statistic. 


H.  The  Symmetric  Divergence  for  Binary  Variables 

A  general  measure  of  the  divergence  between  two  populations  is 
obtained  by  taking  the  difference  between  expected  values  of  log  L(x)  with 
respect  to  the  probability  functions  of  Group  1  and  Group  2,  respectively. 
Thus, 


[log  £‘W  1 

[lo,  iii±l 

-  Ef. 

£2(X)  . 

1  2 

f2W  _ 

(H-76) 


J  is  also  called  the  Kullback- JLeibler  information  number.  Note  that  for 
the  case  where  the  two  populations  are  multivariate  normal  with  different 
mean  vectors  but  equal  covariance  matrices,  the  symmetric  divergence, 
J,  gives  Mahalanobis’  D2  for  two  populations,  i.e.  , 

N  N 

D2  =  ^  vjl  -  m.<2)  )  (mW  -  m^2)  ).'  (H-77) 

i=l  j=l  i  i  J  J 

For  the  case  where  the  variables  xj  ,  x£ ,  ....  xj^  are  binary, 
in  terms  of  the  notation  of  Section  4  of  this  Appendix,  an  approximation 
to  the  effective  distance  between  the  two  groups  is  given  by* 


*  Bahadur,  R.R.,  "On  Classification  Based  on  Responses  to  N 
Dichotomous  Items,  "  USAF  SAM  Series  in  Statistics.  Randolf  AFB 
Texas  1959;  appears  in  Studies  in  Item  Analysis  and  Prediction.  Ed. 
Herbert  Solomon,  Stanford  U.  Press,  1961. 
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(H-78) 
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where  is  as  defined  in  Equation  H-22.  D*  does  not  satisfy  the  triangle 
inequality  and  so  is  not  a  metric.  It  represents  a  "good"  approximation  to 
the  effective  distance  when  the  two  population  distributions  are  not  very 
different  and  the  predictors  are  not  highly  correlated  within  either  group.  * 
A  rough  idea  c£  the  usefulness  of  a  set  of  predictors  for  discrimination  can 
be  obtained  by  using  the  first  two  terms  for  D*2. 


*  Bahadur,  R.  R. ,  "On  Classification  Based  on  Responses  to  N  Dichoto¬ 
mous  Items,  "  USAF  SAM  Series  in  Statistics.  Randolph  AFB,  Texas 
1959;  appears  in  Studies  in  Item  Analysis  and  Prediction,  Ed. 

Herbert  Solomon,  Stanford  U.  PreBS,  1961. 
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and  was  responsible  for  research  design  of  receiver  circuits  in  Philco's  first 
laboratory  all-transistor  set.  This  involved  transistorization  of  video  amplifiers 
circuits;  deflection  oscillators,  drivers,  and  outputs;  audio  stages,  sound  i-f 
amplifiers  and  limiters;  sync  separators  and  AFC  synchronizing  circuits;  high- 
voltage  generation;  and  systems  design.  The  advanced  circuit  research 
culminated  in  an  eight-inch  portable  set  in  1957  and  provides  the  groundwork 
for  the  Philco  "Safari"  portable  produced  in  1959. 

He  has  invented  and  tested  novel  circuit  techniques  in  which  multigrid 
tubes  were  used  to  combine  receiver  functions,  and  supervised  experiments 
in  which  tunnel  diodes  were  used  as  uhf  oscillators.  Several  patent  applications 
were  made  on  the  basis  of  this  work. 

As  a  result  of  a  2  year  stay  in  Norway,  Mr.  Taylor  also  has  experience 
in  the  design,  development  and  production  of  European-Standard  television 
receivers. 

Since  1961,  Mr.  Taylor  has  been  engaged  in  research  on  Information 
Storage  and  Retrieval  and  Artifical  Intelligence.  His  recent  activities  have 
been  in  implementation  studies  for  visual  image  processing  devices,  pattern 
recognition  systems,  and  adaptive  majority  logic.  He  holds  two  patents. 

Eight  patent  applications  filed  since  I960  are  pending  issue.  They  are 
principally  concerned  with  television  techniques  and  novel  circuit  and  device 
inventions  for  signal  handling  and  pulse  processing  systems. 

He  is  a  member  of  Tau  Beta  Pi,  Eta  Kappa  Nu,  Phi  Kappa  Phi,  and  the 
American  Documentation  Institute. 
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JOHN  Z.  GRAYUM,  RESEARCH  SPECIALIST 


Mr.  Grayum  received  the  B.S.  degree  in  physics  from  St.  Joseph's 
College  in  1951  and  the  M.  A.  degree  in  physics  from  Temple  University  in 
1958.  He  has  done  graduate  work  in  electrical  engineering,  mathematics, 
and  physics  at  Temple  University  and  the  Drexel  Institute  of  Technology.  He 
now  is  doing  graduate  work  in  mathematics  at  the  University  of  Pennsylvania. 

After  he  joined  Philco  in  1955,  Mr.  Grayum  participated  in  the  develop¬ 
ment  of  a  transistorized  rc  active  audio  filter.  He  also  helped  develop  passive 
linear  time-delay  filters  and  delay-line  filters. 

Mr.  Grayum  has  directed  work  on  REentrant  DAta  Processors  (REDAP) 
and  has  made  detailed  system  analyses  of  sweep  integrators  and  iterators. 

Later,  he  directed  a  group  of  Industrial  and  Computer  Laboratory  engineers  in 
work  on  PCM  multiplex  equipment,  high-speed  switching  circuits,  satellite 
transmitters,  and  digital  data-handling  equipment. 

In  1958  he  began  conducting  and  directing  studies  of  model  communications, 
especially  as  related  to  secure  communications  systems.  This  work  provided 
significant  improvements  in  encoding,  detection  and  decision  and  sync -scanning 
procedures. 

Mr.  Grayum  was  project  leader  of  studies  in  the  design  of  a  world  wide 
communication  system  using  large  passive  satellites  as  reflectors.  These 
studies  included  a  consideration  of  A/J  technicues.  He  was  the  project  leader, 
and  principle  contributor  of  advanced  studies  to  determine  methods  of  reducing 
the  vulnerability  of  active  satellite  communications  links  to  jamming  and 
spoofing  by  an  intelligent  opponent.  On  a  Philco  sponsored  program,  he  directed 
studies  of  A/J  techniques  including  "Methods  of  Spreading  High  Speed  Data,  " 
"Coding  of  Higher  Order  Alphabets,  "  "Variable  Mode  Communications,  " 

"The  Evaluation  and  Comparison  of  F-T  Dodging  with  the  Philco  Hybrid  Tech¬ 
niques"  and  "Methods  of  Sync  Search.  " 

More  recently  Mr.  Grayum  has  participated  in  fundamental  studies  of 
time-variable  communication  systems,  including  variable  data  rate  and  feed¬ 
back  techniques,  and  sequential  decision  procedures,  including  hypothesis 
testing  and  ranking. 

From  1951  to  1955,  Mr.  Grayum  was  a  Broadcast  Engineer  at  RCA, 
where  he  contributed  to  the  design  and  development  of  UHF  broadcast  antennas 
and  filters. 
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Mr.  Grayum  is  the  author  of  "Optimum  Decision  and  Scanning  Techniques 
for  Synchronization,  "  Natcom  Symposium  Record,  October  1-3,  1962,  pp. 
170-178. 

He  is  the  coauthor  of  an  oral  presentation:  "Advanced  Spread  Spectrum 
Studies,  "  presented  at  the  National  Security  Agency,  Fort  Meade,  Maryland, 
January  1961;  also  presented  at  CCDD,  Bedford,  Mass. ,  May  1961,  and  at  the 
Air  Force  Cambridge  Research  Laboratories,  Lexington,  Mass.,  June  1961, 
(coauthor:  C.  Gumacog). 

Mr.  Grayum  is  a  member  of  Sigma  Pi  Sigma,  a  Senior  Member  of  the 
IEEE  and  is  listed  in  Who's  Who  in  American  Universities  and  Colleges. 


CONSTANTINE  GUMACOS,  RESEARCH  SPECIALIST 

Mr.  Gumacos  received  the  B. S.E.E.  degree  with  highest  honors  from 
the  Georgia  Institute  of  Technology  in  1951  and  has  done  graduate  work  at  the 
Moore  School  of  Electrical  Engineering  at  the  University  of  Pennsylvania. 

After  he  joined  Philco  in  1951,  Mr.  Gumacos  worked  on  sweep  integrators 
for  the  SG-6/SG-6b  radar  equipment  and  on  microwave  relay  systems  for  micro- 
wave  and  color-television  communications.  He  also  worked  on  low-noise  cooled 
crystal  receivers.  In  other  projects,  he  was  responsible  for  the  design  of  a 
data  processor  for  a  coherent  airborne  radar  system  with  high  azimuthal 
resolution  and  worked  on  the  analysis  of  a  velocity -shaped  MTI  data-processing 
system. 

» 

From  October  1958  to  January  I960,  Mr.  Gumacos  contributed  to  advanced 
studies  of  the  Spread  Eagle  S/stem,  a  jam-resistant,  data  link  communications 
system,  after  which  he  participated  La  the  systems  design  for  a  program  for  a 
global  communication  network  which  uses  passive  spherical  satellites.  Later, 
he  participated  in  the  "Midas  Command  Link  Modulation"  study  and  the  "Midas 
Command  Link  Reliability"  study. 

Mr.  Gumacos  has  done  extensive  work  in  the  fields  of  coding  and  synchro¬ 
nization  for  A/J  communication  systems.  He  participated  in  studies  of  codes 
for  secure  communications,  using  the  concepts  of  modern  algebra.  Most 
recently  he  performed  a  study  of  the  theoretical  aspects  of  synchronizing  secure 
systems.  He  has  also  performed  numerous  analyses  of  modulation  techniques. 


Mr.  Gumacos  has  published  these  papers: 

"Advanced  Coding  Studies,  "  Internal  Philco  Report,  June  I960. 

"Analysis  of  an  Optimum  Sync  Search  Procedure,  "  submitted  to  the 
IRE  Transactions  on  Communication  Systems . 

v. 

"Analysis  of  Multiple  Frequency  Shift  Keying  with  Diversity,  "  Philco 
Internal  Report,  September  15,  1961  (coauthor:  Peter  M.  Hahn). 

"Advanced  Spread  Spectrum  Studies,  "  presented  at  the  National  Security 
Agency,  Fort  Meade,  Maryland,  January  196  li  also  presented  at  CCDD, 
Bedford,  Mass. ,  May  1961,  and  at  the  Air  Force  Cambridge  Research  Labora¬ 
tories,  Lexington,  Mass.,  June  1%1  (coauthor:  J.  Z.  Grayum). 


HENRY  G.  KELLETT,  SENIOR  ENGINEER 

Mr.  Kellett  is  a  member  of  the  Image  Recognition  Group  of  the  Advanced 
Technology  Laboratory  where  he  is  engaged  in  the  study  of  problems  in  im¬ 
plementing  an  automatic  imagery  screening  system.  He  is  also  engaged  in  the 
development  of  linear  spatial  filtering  techniques  for  prenormalization  and 
screening  of  aerial  reconnaissance  data.  After  joining  Philco  in  1959  he  worked 
on  the  Data  Conversion  Program  in  which  he  was  responsible  for  the  design  and 
development  of  the  encoder  portion  of  the  equipment.  For  several  months,  he 
also  contributed  to  research  on  secure,  jam  resistant,  and  private  communicatioi 
systems,  and  has  recently  been  involved  in  advanced  object  recognition  studies. 

Mr.  Kellett  received  the  B.S.  degree  in  electrical  engineering  from  the 
University  of  New  Hampshire  in  1959,  and  now  is  doing  graduate  work  in 
electrical  engineering  at  the  University  of  Pennsylvania.  In  addition  to  his 
formal  education,  Mr.  Kellett  has  attended  courses  in  electronics  in  the  Air 
Force  and  at  the  evening  school  of  the  Georgia  Institute  of  Technology. 

Prior  to  obtaining  his  Baccalaureate  degree,  Mr.  Kellett  was  employed 
in  several  technical  positions  in  industry  and  in  the  Air  Force. 

He  is  a  member  of  the  IEEE  and  its  Professional  Group  on  Information 
Theory. 


JERRY  R.  RICHARDS,  SENIOR  ENGINEER 


Mr.  Richards  is  a  member  of  the  Image  Recognition  Group  of  the 
Advanced  Technology  Laboratory  where  he  ia  working  on  studies  and  computer 
simulations  of  the  logics  used  in  pattern- recognition  systems.  Recently, 

Mr.  Richards  investigated  a  logic  for  the  detection  of  significant  changes 
in  information  in  repeat- cover  aerial  photography.  His  work  in  this  area 
includes  the  preparation  of  computer  simulation  programs  for  the  recently 
concluded  Automatic  Video  Data  Analysis  and  Image  Change  Detection  projects. 
Mr.  Richards  also  has  served  as  a  consultant  on  computer  simulation  of 
learning -machine  logic  on  a  project  investigating  adaptive  pattern-recognition 
techniques. 

Mr.  Richards  was  granted  the  B.S.  degree  in  electrical  engineering  by 
the  Drexel  Institute  of  Technology  in  June  1959. 

During  his  student  years,  Mr.  Richards  worked  as  a  Cooperative  Student 
Engineer  on  such  projects  as  the  design  of  electrical  furnaces  and  the  optimiza¬ 
tion  of  electrolysis.  Since  graduation,  he  has  been  a  member  of  the  Fhilco 
Corporation.  He  has  helped  to  design  gamma-correcting  circuits,  has  evaluated 
photographic  material  for  use  in  a  project  to  reduce  the  redundancy  of  pictorial 
data  before  transmission,  has  studied  the  characteristics  of  switching  thin- 
magnetic  films,  and  has  participated  in  the  design  of  the  video  processor  for 
an  electronic,  variable-font  address  reader  being  built  for  the  U.S.  Post  Office 
Department.  He  also  has  studied  video-processing  techniques  as  applied  to 
aerial  photographs. 

Mr.  Richards  is  a  Member  of  the  IEEE. 


JEROME  I.  MANTELL,  PHYSICIST 

Mr.  Mantell  received  the  B.  A,  degree  in  physics  from  the  University  of 
Pennsylvania  in  1961.  He  now  is  working  toward  the  master's  degree  at  the 
same  university. 

In  August  I960,  Mr.  Mantell  joined  the  Philco  Research  Division,  after 
which  he  performed  experimental  and  theoretical  studies  on  the  field  properties 
of  magnesium  oxide,  with  emphasis  on  the  mechanism  of  secondary  electron 
emission.  The  studies  included  investigation  of  the  properties  of  magnesium 
oxide  in  air  and  vacuum.  Mr.  Mantell  prepared  several  patent  disclosures  for 
a  magnesium  oxide,  single  crystal  vacuum  tube  employing  cold-cathode  emission. 
He  constructed  a  simple  theoretical  model  of  the  secondary  emission  phenomenon 
in  terms  of  field  effects . 


From  November  I960  to  February  1961  Mr.  Mantell  did  experimental 
work  in  which  an  electrochemical  light  valve  was  used  as  a  display  and  as 
an  integral  part  of  a  logic  network  including  studies  of  various  properties 
of  the  electroplating  cell,  including  such  properties  as  efficiency,  trans¬ 
mission  of  light,  etc.  He  also  conducted  studies  into  preparation  of  a 
bistable  mode  of  operation. 

Since  February  1961,  Mr.  Mantell  has  performed  theoretical  experiments 
for  optical  logic  systems,  including  programming,  design  and  construction. 

Mr.  Mantell  now  is  investigating  and  constructing  electroluminesent-photocell 
distable  circuits  and  is  working  on  a  preliminary  investigation  of  new  kinds  of 
circuit  elements  for  learning  machines. 

Mr.  Mantell  is  a  Member  of  the  American  Physical  Society  and  Pi  Sigma. 


HANS  P.  DOMABYL,  ENGINEER 

Hans  P.  Domabyl  received  the  "Diplomingenier"  degree  in  Electrical 
Engineering  from  the  Munich  Institute  of  Technology  in  1957  and  was  granted 
a  similar  degree  in  Economics  in  1959  after  postgraduate  Btudies  in  Paris 
and  Munich. 

He  worked  for  two  years  as  an  application  and  sales  engineer  for  the 
Ampex  Corporation  in  Germany,  where  he  was  concerned  with  problems  of 
adapting  magnetic  tape  recorders  for  special  instrumentation  and  telemetry 
systems.  Mr.  Domabyl  joined  the  Philco  Scientific  Laboratory  in  1962  and 
received  training  in  programming  the  Philco  2000  computer  system.'  At  the 
present  time,  he  is  writing  computer  programs  for  automatic  data  recognition. 

Mr.  Dornabyl  is  a  member  of  the  VDE,  the  German  Institute  of  Electrical 
Engineers . 


Thomas  J.  B.  Shanley,  Manager,  Recognition  Laboratory 

Dr,  Shanley  received  the  B.S.  degree  in  engineering  from  the  U.  S, 
Military  Academy  in  1939.  He  did  graduate  work  in  cosmic  ray  physics  at 
Princeton  University,  which  granted  him  the  Ph.D.  degree  in  1951.  The  title 
of  his  doctoral  thesis  is,  Gamma  Ray  Production  in  Mu  Mesa  Capture. 

Dr.  Shanley  joined  the  Philco  Corporation  in  1961  and  is  now  directing 
the  research  and  advanced  development  work  in  the  Advanced  Technology 
Laboratory  on  visual  pattern  recognition  and  speech  recognition. 

From  1939  until  he  joined  Philco,  Dr.  Shanley  served  in  the  U.S.  Army. 
He  rose  from  Second  Lieutenant  to  Colonel  during  his  Army  career.  He  com¬ 
manded  a  company  in  the  original  Army  parachute  unit,  the  501st  Parachute 
Battalion,  from  1939  to  1946.  During  this  period,  he  pioneered  in  the  develop¬ 
ment  of  mass  airborne  drop  techniques.  In  1943,  he  was  promoted  to  Lieutenant 
Colonel  to  command  a  parachute  infantry  battalion  of  the  82nd  Airborne  Division 
in  the  airborne  assault  of  Normandy  and  commanded  a  regiment  in  combat.  He 
received  a  Silver  Star,  two  Bronze  Stars,  and  the  Purple  Heart. 

From  1950  to  1953,,  Dr.  Shanley  was  the  Chief  of  the  ResearchDivision  in 
the  Research  and  Development  Section  of  the  Office  of  the  Chief  of  Army  Field 
Forces.  In  this  capacity,  he  formulated  requirements  for  new  Army  weapons 
systems  and  conducted  and  reviewed  studies  of  the  effectiveness  of  weapons. 

He  originated  the  "Combat  Development"  concept  adopted  by  the  Army  in  1953, 
by  which  weapons,  doctrine,  and  organizational  concepts  are  conceived, 
developed,  and  tested  concurrently. 

From  1954  to  1955,  Dr.  Shanley  commanded  the  19th  Infantry  Regiment, 
24th  Infantry  Division,  in  Korea.  From  1958  to  I960,  he  was  the  Chief  of  the 
Atomic -Chemical  and  Biological  Warfare  Division  in  the  Office  of  the  Deputy 
Chief  of  Staff  for  Military  Operations,  Department  of  the  Army  Headquarters. 

In  this  capacity,  he  directed  a  group  of  Army  officers  in  determining  require¬ 
ments  for  and  planning  for  the  use  of  nuclear  delivery  systems  and  chemical 
and  biological  weapons.  In  I960,  Dr.  Shanley  was  assigned  to  the  Office  of  the 
Joint  Chiefs  of  Staff.  He  established  requirements  and  prepared  plans  for  the 
use  of  nuclear  weapons  and  guided  missiles. 

Dr.  Shanley' s  publications  include  the  following: 

"Influence  on  the  Cosmic  Ray  Spectrum  of  Five  Heavenly  Bodies,  "  Review 
of  Modern  Physics,  Yol.  21,  No.  1,  January  1949,  pp.  51  to  7 1  (coauthors: 

J.  A.  Wheeler  and  E.  O.  Kane). 
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"A  Preliminary  Directional  Study  of  Cosmic  Rays  at  High  Altitude,  II, 
Physical  Review,  Vol.  76,  No.  8,  October  15,  1949,  pp.  1005  to  1019  (coauthors: 
J.  R.  Winckler  and  W.  G.  Stroud). 

"Gamma  Rays  from  Negative  Mu  Mescn  Capture  in  Lead,  "  Physical  Review, 
Vol.  89,  No.  5,  March  1953,  pp.  983  to  990  (coauthor:  G.  G.  Harris). 

"Evaluation  of  Weapons,  Tactics,  and  Organizational  Concepts,  "  Military 
Review.  July  1954,  pp.  31  to  36. 

"Non-Nuclear  NATO  Army, "  Army,  Vol.  11,  No.  5,  December  I960, 
pp.  28  to  30. 

Dr.  Shanley  is  a  member  of  Sigma  Xi,  the  Operations  Research  Society, 
the  Association  of  the  U.  S.  Army,  and  the  IEEE. 
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